The senior–mid–junior analogy
A team has three engineers: a senior architect who sees the whole system but is expensive and not always around, a mid-level developer who handles 90% of daily work brilliantly, and a junior who's lightning fast on small, well-defined tickets.
Claude's lineup maps cleanly: Opus is the architect, Sonnet is the daily driver, Haiku is the junior. Most projects need all three at different points; the trick is matching task to engineer.
The lineup at a glance
| Model | Speed | Cost | Best at | Watch out for |
|---|---|---|---|---|
| Opus | Slowest | Most expensive | Deep reasoning, hard refactors, novel problems | Overkill on simple tasks; slow at scale |
| Sonnet | Mid | Mid | Production agents, coding, content, RAG | Sometimes needs Opus assist on the trickiest 5% |
| Haiku | Fastest | Cheapest | Classification, extraction, routing, micro-agents | Stumbles on multi-step reasoning; weaker tool use |
Generations matter — "Sonnet 4.5" beats older "Opus 3" on many tasks. Always benchmark against the current version, not a historical reputation.
Picking by task
- Open-ended planning, novel architecture, hard debugging → Opus. Pay the latency tax for the reasoning.
- Daily agent work — Claude Code, RAG bots, coding assistants → Sonnet. The sweet spot.
- High-volume classification, routing, summarisation → Haiku. 10× cheaper, plenty for narrow tasks.
- Real-time UX (autocomplete, live chat first reply) → Haiku for first response, optionally upgrade if the question is hard.
The router pattern
Production stacks rarely use one model. A common pattern:
- Triage with Haiku — "is this trivial, normal, or hard?"
- Route accordingly — trivial answers from Haiku, normal from Sonnet, hard escalate to Opus.
- Cap the budget — never escalate to Opus for free; require an explicit signal.
Routing alone often cuts spend 60–80% with negligible quality loss.
Cost mental model
For long-context, output-heavy workloads, output tokens dominate. Picking a smaller model that produces shorter, sharper answers can save more than a tier-down on the input price. Measure on your traffic, not on synthetic benchmarks.
Common mistakes
- Defaulting to the biggest model "just to be safe." It is rarely safer; it's just more expensive.
- Defaulting to the smallest model to save money. If quality drops 5%, support load goes up 30%.
- Comparing across generations. "Haiku 4.5" can outperform "Opus 3.5" on many tasks. Use names and versions.
- Ignoring evals. Benchmarks rarely match your domain. Build a 50-prompt eval set; run it against each tier; pick the cheapest one that passes.
When to bring in Opus deliberately
- New problem class your team has never solved.
- Critical PR review where a single missed bug is expensive.
- Long-form planning that downstream agents will execute (errors compound).
- Reasoning over a large structured artifact (architecture docs, ledger, schema).
For the rest — and "the rest" is usually 90% of traffic — Sonnet is the answer.
In one line
Use Opus to think, Sonnet to ship, Haiku to scale. Benchmark your task; the cheapest model that passes wins.