Home Concept Explainers Training & Fine-Tuning Fine-Tuning vs RAG: When to Teach, When to Look Up

Training & Fine-Tuning MCP handshake 3 sliders

Fine-Tuning vs RAG: When to Teach, When to Look Up

Fine-tuning changes what the model knows; RAG gives it a reference shelf at query time. Most "make the LLM know our docs" jobs are RAG jobs.

Apr 29, 2026 · 2 min de lecture

Aller au lab Sans inscription · Gratuit pour toujours

▸ Essaie par toi-même

Glisse un slider — le diagramme réagit en direct.

Espace pour play · ←/→ pour scruber

MCP handshake

FR /100 SN-312

SPACE · ◄ ►

¶ L'analogie

The doctor analogy

A new doctor walks in with general medical training. To work in your clinic effectively, they need two very different things:

Style and bedside manner — how we explain things, the tone we use, the structure of our notes. This is fine-tuning: you teach it through example until it becomes second nature.
Patient histories and live lab results — facts that change daily. You do not retrain the doctor every morning. You give them a chart. This is RAG.

Confusing the two is the most common AI architecture mistake.

Quick decision matrix

You want to change…	Reach for…
Knowledge that updates frequently	RAG
Private documents	RAG
Tone, format, style	Fine-tuning
A new skill the base model lacks	Fine-tuning
Tool-call format the model gets wrong	Fine-tuning
Reasoning behaviour	Fine-tuning + good evals

If your goal is "answer questions using our docs," 95% of the time the answer is RAG.

Why RAG wins on facts

Freshness — re-index, done. Fine-tuning needs another training run.
Provenance — every answer can cite the chunk it used. Fine-tuned models cannot.
Cost — embedding an extra doc is cents. Fine-tuning is dollars-to-thousands.
Privacy — keep the docs in your vector store; never bake them into a shared model.

Why fine-tuning wins on style

Consistency — your brand voice on every answer, no system prompt gymnastics.
Format adherence — when the model needs to emit a non-trivial structure (custom DSL, particular JSON shape) tens of thousands of times.
Latency / cost at scale — a smaller fine-tuned model can match a bigger general one on a narrow task.

They compose

Plenty of production systems use both: fine-tune a small model for tone + tool-call format, then RAG for the actual content. Each technique is solving a different problem.

When to skip both

Before you build either, try:

System prompts + few-shot examples — costs nothing extra at training time.
Better tools (Function calling, JSON mode) — often what people reach for fine-tuning to fix.
Bigger model — sometimes the right answer is "use a more capable model" rather than coaxing a small one.

Reach for the heavier tool only when the lighter one demonstrably falls short. Most "we need to fine-tune" requests dissolve under that pressure.