The 80/20 of cost optimization
Routing different tasks to different models is the highest-leverage cost lever you have. Most agents waste money by running every task on the most capable model "just in case".
The router
A small classifier (a one-shot prompt to Haiku, or a deterministic rule list) labels each incoming task with a tier:
| Tier | Examples | Model |
|---|---|---|
trivial |
"summarize this 3-line email" | Local (Ollama) |
easy |
"draft a Telegram reply" | Haiku 4.5 |
standard |
"write a focused blog section" | Sonnet 4.6 |
hard |
"design a multi-file refactor" | Opus 4.7 |
Heuristic-first router
You do not need an LLM to decide tier. A 30-line Python rule engine handles 95% of cases:
def tier(task: str, files_touched: int) -> str:
if files_touched > 1: return "hard"
if any(k in task.lower() for k in ["refactor", "migration", "design"]): return "hard"
if len(task) > 500: return "standard"
if any(k in task.lower() for k in ["summarize", "classify", "tag"]): return "trivial"
return "easy"
LLM-based routing is overkill until your rules start mis-classifying a meaningful fraction.
The result
Properly routed, a typical month with thousands of agent calls splits roughly 60% local, 25% Haiku, 12% Sonnet, 3% Opus. The 3% delivers most of the value; the rest is invisible plumbing.
Try it
Implement the rule above. Watch your weekly bill for two weeks. Tune from data, not hunches.