Concepts you can scrub & feel.

AI Operations & Production 3 min read

AI Cost Optimization: Cutting LLM Bills 80%

Most LLM bills can be cut by 50–90% without quality loss. Caching, model routing, prompt diet, and output caps deliver the bulk of it.

/ai-cost-optimization Try it now

AI Operations & Production 3 min read

AI Observability: Tracing Every Token in Production

Without traces, every LLM bug is a guess. Capture prompts, tool calls, tokens, costs, and latencies for every request — searchable, filterable, alertable.

/ai-observability-traci… Try it now

AI Operations & Production 3 min read

LLMOps: MLOps for the LLM Era

LLMOps is the operational discipline of running LLM apps in production — prompts as code, evals on every change, observability, cost, and incident response.

/llmops-explained Try it now

The full library

Pick your next concept

60 items

Neural Networks & Deep Learning 2 min read

Backpropagation: How a Network Actually Learns

Backprop is just credit assignment — blame each parameter for the error, in proportion. Tune learning rate and batch size to see training stabilise or diverge.

/backpropagation-how-a-… Try it now

Neural Networks & Deep Learning 2 min read

Neurons, Layers, and Why Depth Matters

A neuron is a weighted sum followed by a kink. Stack a million in layers and you get a function that approximates almost anything.

/neurons-layers-and-why… Try it now

Training & Fine-Tuning 3 min read

Gradient Descent: Rolling Downhill to a Smarter Model

Training is a marble rolling down a wrinkled hill — the loss landscape. Tune learning rate and momentum to see it slide, oscillate, or get stuck.

/gradient-descent-rolli… Try it now

Training & Fine-Tuning 2 min read

Fine-Tuning vs RAG: When to Teach, When to Look Up

Fine-tuning changes what the model knows; RAG gives it a reference shelf at query time. Most "make the LLM know our docs" jobs are RAG jobs.

/fine-tuning-vs-rag-whe… Try it now

Training & Fine-Tuning 2 min read

LoRA: Cheap Fine-Tuning Without Touching the Whole Model

LoRA freezes the giant model and trains tiny rank-r adapters next to it. 7B-param model, ~1% of the trainable weights, 99% of the quality.

/lora-cheap-fine-tuning… Try it now

Training & Fine-Tuning 3 min read

Knowledge Distillation: Teaching a Small Model to Imitate a Big One

Distillation trains a small student model to mimic a big teacher's soft outputs. You ship the small one — much cheaper, surprisingly close in quality.

/knowledge-distillation… Try it now

Inference & Optimization 2 min read

Quantization: Shrinking Models Without Killing Them

Store every weight in 4 bits instead of 16, fit a 70B model on one GPU, and lose almost no quality. Tune precision to feel the trade-off.

/quantization-shrinking… Try it now

Inference & Optimization 3 min read

KV Cache: Why the Second Token Is Faster Than the First

Without a KV cache, every new token re-computes attention over the whole sequence. With it, you reuse all previous work. This is most of LLM serving.

/kv-cache-why-second-to… Try it now

Inference & Optimization 3 min read

Batching: How Inference Servers Serve a Thousand Users at Once

GPUs are starved on a single request — most of the chip is idle. Batching packs many requests into one forward pass for huge throughput wins.

/batching-how-inference… Try it now

Inference & Optimization 3 min read

Speculative Decoding: A Cheap Model Guessing for an Expensive One

A tiny draft model proposes 5 tokens at once; the big model verifies them in a single forward pass. Net effect: 2–3× faster decode at identical quality.

/speculative-decoding-f… Try it now

AI Evaluation & Safety 3 min read

Hallucinations: Why LLMs Make Stuff Up Confidently

Hallucinations are not bugs — they are the model doing exactly what it was trained to do. Plausibility is the loss; truth is not. Understand the trap, then engineer around it.

/why-llms-hallucinate Try it now