Concepten die je kunt voelen & scrubben.
Sla de docs van 40 pagina's over. Elke uitleg verandert een lastig AI-, Claude Code-, MCP- of cloudconcept in een live, geanimeerd diagram dat je kunt slepen, scrubben en breken — zodat het idee binnen minuten echt klikt, niet in uren.
Drie stappen. Het idee blijft hangen.
Lees de analogie van 60 seconden
Elk concept opent met een kort, helder verhaal. Geen jargon, geen ruis — alleen het mentale model dat je nodig hebt.
Scrub de live animatie
Druk op play, sleep de tijdlijn of gebruik de pijltjestoetsen. Bekijk elke stap frame voor frame tot de flow logisch is.
Duw de sliders tot het uiterste
Pas elke parameter aan. Het diagram werkt direct bij, zodat je de trade-offs voelt en de grenzen onthoudt.
Populairste uitleggen
AI Cost Optimization: Cutting LLM Bills 80%
Most LLM bills can be cut by 50–90% without quality loss. Caching, model routing, prompt diet, and output caps deliver the bulk of it.
AI Observability: Tracing Every Token in Production
Without traces, every LLM bug is a guess. Capture prompts, tool calls, tokens, costs, and latencies for every request — searchable, filterable, alertable.
LLMOps: MLOps for the LLM Era
LLMOps is the operational discipline of running LLM apps in production — prompts as code, evals on every change, observability, cost, and incident response.
Kies je volgende concept
Backpropagation: How a Network Actually Learns
Backprop is just credit assignment — blame each parameter for the error, in proportion. Tune learning rate and batch size to see training stabilise or diverge.
Neurons, Layers, and Why Depth Matters
A neuron is a weighted sum followed by a kink. Stack a million in layers and you get a function that approximates almost anything.
Gradient Descent: Rolling Downhill to a Smarter Model
Training is a marble rolling down a wrinkled hill — the loss landscape. Tune learning rate and momentum to see it slide, oscillate, or get stuck.
Fine-Tuning vs RAG: When to Teach, When to Look Up
Fine-tuning changes what the model knows; RAG gives it a reference shelf at query time. Most "make the LLM know our docs" jobs are RAG jobs.
LoRA: Cheap Fine-Tuning Without Touching the Whole Model
LoRA freezes the giant model and trains tiny rank-r adapters next to it. 7B-param model, ~1% of the trainable weights, 99% of the quality.
Knowledge Distillation: Teaching a Small Model to Imitate a Big One
Distillation trains a small student model to mimic a big teacher's soft outputs. You ship the small one — much cheaper, surprisingly close in quality.
Quantization: Shrinking Models Without Killing Them
Store every weight in 4 bits instead of 16, fit a 70B model on one GPU, and lose almost no quality. Tune precision to feel the trade-off.
KV Cache: Why the Second Token Is Faster Than the First
Without a KV cache, every new token re-computes attention over the whole sequence. With it, you reuse all previous work. This is most of LLM serving.
Batching: How Inference Servers Serve a Thousand Users at Once
GPUs are starved on a single request — most of the chip is idle. Batching packs many requests into one forward pass for huge throughput wins.
Speculative Decoding: A Cheap Model Guessing for an Expensive One
A tiny draft model proposes 5 tokens at once; the big model verifies them in a single forward pass. Net effect: 2–3× faster decode at identical quality.
Hallucinations: Why LLMs Make Stuff Up Confidently
Hallucinations are not bugs — they are the model doing exactly what it was trained to do. Plausibility is the loss; truth is not. Understand the trap, then engineer around it.
AI Evals: How to Tell If Your Model Is Actually Better
Without evals, "the new prompt feels better" is just vibes. A good eval suite catches regressions before users do — here is how to build one.
Stop met lezen erover. Begin met scrubben.
Vastgelopen op een AI-, Claude Code- of cloudconcept? Vertel me wat niet klikt — ik bouw een gratis interactieve uitleg met analogie, animatie en sliders, meestal binnen een week.