The kanban-board analogy
A project that lives in one giant ticket — "build the new dashboard" — is impossible to manage. Split it into a board of small cards — spec, mock, build, test, ship — and each card has a clear input, output, and reviewer. Progress becomes visible. Debugging becomes possible.
Prompt chaining is the same move applied to LLM tasks. One prompt that says "do everything" is brittle. A chain of prompts — each focused, each with a clean input/output contract — is robust, debuggable, and incrementally improvable.
The pattern
input → [Prompt 1: extract] → structured X
→ [Prompt 2: enrich] → enriched X'
→ [Prompt 3: format] → final output
Each box is a separate LLM call with a small, focused prompt. The output of one is the input of the next.
Why it beats one mega-prompt
- Each step is testable. You can eval prompt 2 in isolation with synthetic inputs.
- Each step is replaceable. Swap a model per step (Haiku for extract, Sonnet for synthesis).
- Failures are localised. When the output is wrong, you can see exactly which step broke.
- Smaller context per step. Each call sees only what it needs — cheaper, less distraction.
- Independent retries. Retry the failing step, not the whole chain.
A mega-prompt is one ball of mud where every change risks regressing every other behaviour.
Common chain shapes
Linear (most common)
extract → analyse → respond
Simple, easy to reason about, easy to debug.
Conditional / branching
classify → if X then path A; if Y then path B
Different downstream chains for different input types. Routing is itself an LLM call (or a classifier).
Map-reduce
split into chunks → run prompt on each → merge
Excellent for long inputs (summarise 100 reviews) where a mega-prompt would blow context.
Iterative refinement
draft → critique → revise → critique → revise
The output gets better with each round. Cap the rounds (3 is usually enough).
Validate and route
first attempt → validate → if invalid, retry with feedback
A specific case of conditional. Cuts failure rates dramatically when validation is cheap (schema check, lint, test pass).
Designing chain steps
- One job per step. "Extract the order details" is a step. "Extract the order details and write a follow-up email" is two steps in disguise.
- Structured handoffs. The output between steps should be JSON, not prose. No regex parsing in your runtime.
- Tight prompts. Each step's prompt is short, focused, and free of unrelated instructions.
- Stateless steps. A step takes input → produces output. Side effects live outside, in your code.
Where chains beat agents
For workflows that are deterministic and fixed ("always extract → enrich → format"), a chain is simpler than an agent. Agents are right when the control flow itself depends on the input ("decide if you need to search, then maybe call this tool").
Heuristic: write the workflow as a flowchart. If the flowchart fits on a page with no loops or branches that depend on model judgement, you have a chain. Otherwise you have an agent.
Where chains stumble
- Compounding errors. A 10-step chain with 95% per-step accuracy ends at 60%. Add validation between steps.
- Verbose intermediates. Each step's output is some other step's context budget. Watch token totals.
- Latency. Sequential calls add up. Parallelise where the DAG allows.
- Snowballing prompts. Engineers pile instructions into one step instead of adding a step. Resist.
Tooling
- LangChain has chain primitives baked in.
- LangGraph for DAG-shaped chains with explicit nodes and edges.
- Plain code is often the right answer —
await step1(); await step2(). Frameworks help when the chain has branching, retries, and tracing needs.
In one line
One mega-prompt looks elegant in the demo and rots in production. A chain of small prompts is the boring, debuggable, shippable version of the same idea.