The doctor analogy
A new doctor walks in with general medical training. To work in your clinic effectively, they need two very different things:
- Style and bedside manner — how we explain things, the tone we use, the structure of our notes. This is fine-tuning: you teach it through example until it becomes second nature.
- Patient histories and live lab results — facts that change daily. You do not retrain the doctor every morning. You give them a chart. This is RAG.
Confusing the two is the most common AI architecture mistake.
Quick decision matrix
| You want to change… | Reach for… |
|---|---|
| Knowledge that updates frequently | RAG |
| Private documents | RAG |
| Tone, format, style | Fine-tuning |
| A new skill the base model lacks | Fine-tuning |
| Tool-call format the model gets wrong | Fine-tuning |
| Reasoning behaviour | Fine-tuning + good evals |
If your goal is "answer questions using our docs," 95% of the time the answer is RAG.
Why RAG wins on facts
- Freshness — re-index, done. Fine-tuning needs another training run.
- Provenance — every answer can cite the chunk it used. Fine-tuned models cannot.
- Cost — embedding an extra doc is cents. Fine-tuning is dollars-to-thousands.
- Privacy — keep the docs in your vector store; never bake them into a shared model.
Why fine-tuning wins on style
- Consistency — your brand voice on every answer, no system prompt gymnastics.
- Format adherence — when the model needs to emit a non-trivial structure (custom DSL, particular JSON shape) tens of thousands of times.
- Latency / cost at scale — a smaller fine-tuned model can match a bigger general one on a narrow task.
They compose
Plenty of production systems use both: fine-tune a small model for tone + tool-call format, then RAG for the actual content. Each technique is solving a different problem.
When to skip both
Before you build either, try:
- System prompts + few-shot examples — costs nothing extra at training time.
- Better tools (Function calling, JSON mode) — often what people reach for fine-tuning to fix.
- Bigger model — sometimes the right answer is "use a more capable model" rather than coaxing a small one.
Reach for the heavier tool only when the lighter one demonstrably falls short. Most "we need to fine-tune" requests dissolve under that pressure.
A sane order of operations
- Prompt with examples → see how far that goes.
- Add RAG if the failure mode is "doesn't know our stuff."
- Add fine-tuning if the failure mode is "style/format/skill."
- Combine both if you need both.
Skipping step 1 is how teams burn months on training pipelines they did not need.