Konzepte, die du fühlen kannst.
Spar dir die 40-seitige Doku. Jede Erklärung verwandelt ein kniffliges KI-, Claude-Code-, MCP- oder Cloud-Konzept in ein animiertes, scrubbares Diagramm, das du ziehen und brechen kannst — bis die Idee in Minuten sitzt, nicht in Stunden.
Drei Schritte. Die Idee bleibt.
Die 60-Sekunden-Analogie lesen
Jedes Konzept beginnt mit einer kurzen Geschichte in Klartext. Kein Fachjargon — nur das mentale Modell, das du brauchst.
Die Live-Animation scrubben
Drücke Play, ziehe die Timeline oder benutze die Pfeiltasten. Sieh jeden Schritt Bild für Bild, bis der Ablauf klick macht.
Die Slider an die Grenze treiben
Justiere jede Variable. Das Diagramm reagiert sofort — so spürst du die Trade-offs und merkst dir die Grenzen.
Beliebteste Erklärungen
AI Cost Optimization: Cutting LLM Bills 80%
Most LLM bills can be cut by 50–90% without quality loss. Caching, model routing, prompt diet, and output caps deliver the bulk of it.
AI Observability: Tracing Every Token in Production
Without traces, every LLM bug is a guess. Capture prompts, tool calls, tokens, costs, and latencies for every request — searchable, filterable, alertable.
LLMOps: MLOps for the LLM Era
LLMOps is the operational discipline of running LLM apps in production — prompts as code, evals on every change, observability, cost, and incident response.
Wähle dein nächstes Konzept
Backpropagation: How a Network Actually Learns
Backprop is just credit assignment — blame each parameter for the error, in proportion. Tune learning rate and batch size to see training stabilise or diverge.
Neurons, Layers, and Why Depth Matters
A neuron is a weighted sum followed by a kink. Stack a million in layers and you get a function that approximates almost anything.
Gradient Descent: Rolling Downhill to a Smarter Model
Training is a marble rolling down a wrinkled hill — the loss landscape. Tune learning rate and momentum to see it slide, oscillate, or get stuck.
Fine-Tuning vs RAG: When to Teach, When to Look Up
Fine-tuning changes what the model knows; RAG gives it a reference shelf at query time. Most "make the LLM know our docs" jobs are RAG jobs.
LoRA: Cheap Fine-Tuning Without Touching the Whole Model
LoRA freezes the giant model and trains tiny rank-r adapters next to it. 7B-param model, ~1% of the trainable weights, 99% of the quality.
Knowledge Distillation: Teaching a Small Model to Imitate a Big One
Distillation trains a small student model to mimic a big teacher's soft outputs. You ship the small one — much cheaper, surprisingly close in quality.
Quantization: Shrinking Models Without Killing Them
Store every weight in 4 bits instead of 16, fit a 70B model on one GPU, and lose almost no quality. Tune precision to feel the trade-off.
KV Cache: Why the Second Token Is Faster Than the First
Without a KV cache, every new token re-computes attention over the whole sequence. With it, you reuse all previous work. This is most of LLM serving.
Batching: How Inference Servers Serve a Thousand Users at Once
GPUs are starved on a single request — most of the chip is idle. Batching packs many requests into one forward pass for huge throughput wins.
Speculative Decoding: A Cheap Model Guessing for an Expensive One
A tiny draft model proposes 5 tokens at once; the big model verifies them in a single forward pass. Net effect: 2–3× faster decode at identical quality.
Hallucinations: Why LLMs Make Stuff Up Confidently
Hallucinations are not bugs — they are the model doing exactly what it was trained to do. Plausibility is the loss; truth is not. Understand the trap, then engineer around it.
AI Evals: How to Tell If Your Model Is Actually Better
Without evals, "the new prompt feels better" is just vibes. A good eval suite catches regressions before users do — here is how to build one.
Hör auf, davon zu lesen. Fang an zu scrubben.
Festgefahren bei einem KI-, Claude-Code- oder Cloud-Konzept? Sag mir, was nicht klickt — ich liefere eine kostenlose interaktive Erklärung mit Analogie, Animation und Slidern, meist innerhalb einer Woche.