Skip to main content
Chapter 7 Local Models & Cost Optimization (Ollama, Caching, Routing)

Smart Model Routing: Opus, Sonnet, Haiku, Local

8 min read Lesson 43 / 65 Preview

The 80/20 of cost optimization

Routing different tasks to different models is the highest-leverage cost lever you have. Most agents waste money by running every task on the most capable model "just in case".

The router

A small classifier (a one-shot prompt to Haiku, or a deterministic rule list) labels each incoming task with a tier:

Tier Examples Model
trivial "summarize this 3-line email" Local (Ollama)
easy "draft a Telegram reply" Haiku 4.5
standard "write a focused blog section" Sonnet 4.6
hard "design a multi-file refactor" Opus 4.7

Heuristic-first router

You do not need an LLM to decide tier. A 30-line Python rule engine handles 95% of cases:

def tier(task: str, files_touched: int) -> str:
    if files_touched > 1: return "hard"
    if any(k in task.lower() for k in ["refactor", "migration", "design"]): return "hard"
    if len(task) > 500: return "standard"
    if any(k in task.lower() for k in ["summarize", "classify", "tag"]): return "trivial"
    return "easy"

LLM-based routing is overkill until your rules start mis-classifying a meaningful fraction.

The result

Properly routed, a typical month with thousands of agent calls splits roughly 60% local, 25% Haiku, 12% Sonnet, 3% Opus. The 3% delivers most of the value; the rest is invisible plumbing.

Try it

Implement the rule above. Watch your weekly bill for two weeks. Tune from data, not hunches.

Engr Mejba Ahmed

Engr Mejba Ahmed

Claude Code Expert · Online

👋

Hey there!

Quick Actions

WhatsApp Instant reply

Chat on WhatsApp

+880 1723 741224 · Instant reply

Popular Questions

Engr Mejba Ahmed is connected
Engr Mejba Ahmed is typing...
Engr Mejba Ahmed avatar

✉ Want me to follow up? Drop your email

Engr Mejba Ahmed avatar

📞 Connect Directly

Choose how you'd like to reach me

WhatsApp

+880 1723 741224

Email

[email protected]

✓ Details sent! I'll get back to you shortly.

Powered by OpenAI

335+

Blog Posts

25

AI Courses

63

Projects

Services & Expertise

Pricing & Process

Learning & Resources

Connect & Support