The detective analogy
A detective at a crime scene does not silently scan the room and announce a verdict. They think aloud — "the mud on the carpet means…" — act ("let me check the back door"), observe the result ("locked from inside"), and feed it into the next thought. The pattern is so natural we don't even name it.
ReAct (Reasoning + Acting) names it. Each agent step is a Thought → Action → Observation triple. Reasoning conditions the action; the action's result conditions the next reasoning. The detective's logic, in code.
The shape of one step
Thought: I should check the user's recent orders to see why they're asking.
Action: get_orders(user_id="u_42", limit=10)
Observation: 3 orders, latest "shipped" yesterday, tracking failed to update.
Thought: The tracking is the issue. Let me look up the carrier next.
Action: get_tracking(order_id="o_991")
...
Repeat until the model emits a final answer (or hits a stop condition). The agent's transcript is literally readable as a thought process — which makes debugging dramatically easier than opaque function-call sequences.
Why ReAct beats both extremes
- Pure reasoning (CoT only) — model "thinks" but cannot fetch facts. Hallucinations explode on knowledge-heavy tasks.
- Pure acting (tool calls without reasoning) — model fires tools blindly, picks badly, never reflects on results.
- ReAct — reasoning grounds the next action; observation grounds the next reasoning. The two halves correct each other.
What to put in each slot
Thought
- Short, plan-shaped, in plain language.
- One sentence is often enough. Long-thought is a tax.
- Should reference what the next action will be — "I'll do X to find Y."
Action
- A structured tool call. Function name + typed arguments.
- One action per step. Multiple parallel actions are a different pattern (parallel ReAct or fan-out).
Observation
- The tool's output, possibly truncated to fit context.
- Big outputs (10MB log files) need summarisation or pagination — never just paste raw.
- Errors are observations too — a 500 from an API gives the model real information.
Stopping conditions
- Final answer emitted — the model's prose says "Final Answer: …".
- Confidence threshold reached — your code looks at the latest thought and decides "good enough."
- Max steps hit — hard ceiling. Always set this. 8–12 is sane for most tasks.
- Loop detected — same action twice in a row? Break the loop deliberately.
Engineering ReAct in practice
- Templates with clear delimiters.
<thought>,<action>,<observation>tags or similar. Makes parsing trivial. - Truncate observations. Long tool outputs eat your context. Summarise once, store the full thing for audit.
- Carry minimal state across steps. Don't re-paste the entire history if you can help it; use prompt caching for stable prefixes.
- Separate the model's reasoning model from the tool runner. Two services, two responsibilities; easier to evolve, easier to debug.
Variants and friends
- ReAct + reflection — at the end of an episode, the model reviews what it did and writes a "lesson" stored for next time. Compounds quality over runs.
- Plan-and-act — produce a multi-step plan first, then execute steps with smaller per-step thinking. Reduces drift on long tasks.
- Multi-agent ReAct — different agents run their own ReAct loops; a supervisor stitches the results.
Where ReAct stumbles
- Very long horizons. 50+ steps and the context becomes a sea of observations. Use planning + summarisation aggressively.
- Ambiguous tools. "I don't know which tool to call" — solved by better tool descriptions, not more reasoning.
- Repetitive thoughts. A model talking to itself in circles is a sign of poor tool design or unclear goal — don't paper over it with bigger context.
In one line
ReAct is the reasoning-agent default. The transcript is your debug log; the loop is your runtime.