Home Concept Explainers Reasoning Patterns Tree of Thoughts: When LLMs Need to Branch and Backtrack

Reasoning Patterns Crawler graph 3 sliders

Tree of Thoughts: When LLMs Need to Branch and Backtrack

Tree of Thoughts explores multiple reasoning branches, prunes bad ones, and backtracks. Use it when the right path is not the first one the model picks.

Apr 29, 2026 · 4 min de lectura

Ir al laboratorio Sin registro · Gratis para siempre

▸ Pruébalo tú mismo

Arrastra un slider — el diagrama reacciona en tiempo real.

Espacio para play · ←/→ para scrubear

Crawler graph

FR /100 SN-514

SPACE · ◄ ►

¶ La analogía

The chess-tree analogy

A chess player does not pick a move and march. They branch — "if I push the pawn, they could do A, B, or C; let me consider each" — prune unpromising lines, and commit to the one that survives evaluation. The search is the strategy.

Tree of Thoughts (ToT) brings that to LLM reasoning. Instead of one chain of thought, the model considers several candidate next steps, expands the promising ones, kills the weak ones, and only commits when the search has done its job.

The four moving parts

Thought decomposition — break the problem into smaller "thought" units (a paragraph, a step, a partial plan).
Thought generation — at each node, the model proposes K candidate next thoughts.
Thought evaluation — the model (or a separate critic) scores each candidate: is it making progress? plausible? a dead end?
Search strategy — BFS, DFS, beam search, or Monte Carlo tree search over the node graph.

The transcript becomes a tree (or graph) of thoughts, not a single chain.

When ToT beats CoT

ToT helps when:

The problem has multiple plausible paths (not just one obvious one).
Early commitment is costly — once the chain marches down a wrong road, recovery is hard.
Backtracking is natural — sometimes you have to abandon a partial solution.
Verification is cheaper than generation — you can score a candidate quickly, even if you can't author it directly.

Examples that fit:

Game of 24 (combine numbers with arithmetic to hit 24) — multiple decompositions, easy to verify.
Crosswords — branching choices, verifiable correctness.
Multi-step planning with optional steps and dependencies.
Code synthesis with constraints — try a structure, evaluate, backtrack.

When ToT is overkill

Single-path reasoning — math problem with a deterministic algorithm.
Simple lookups — no branching gives you no value.
Latency-critical UX — the search is multiplicatively more expensive.
When self-consistency (generate N CoTs, vote) gives you 90% of the lift at a fraction of the complexity.

ToT is a heavier hammer. Use it when the simpler hammers (CoT, self-consistency, ReAct) demonstrably aren't enough.

Engineering it

Branching factor (K) — typical 3–5. Higher K = more coverage, more cost.
Depth limit — cap how deep the tree grows; 5–10 is enough for most tasks.
Evaluation cost — every node-scoring is an LLM call. The eval call should be much cheaper than generation (smaller model, shorter prompt).
Pruning aggressiveness — keep top-N at each level (beam search) to bound the explosion.
Backtrack triggers — score below threshold, stuck on the same step twice, contradiction detected.

Variants and cousins

Graph of Thoughts — generalises tree to a DAG; nodes can merge.
Algorithm of Thoughts — uses heuristics from a known algorithm to guide search (a star with thoughts).
MCTS-prompted reasoning — Monte Carlo tree search but the rollouts are LLM completions.

What ToT doesn't fix

Bad evaluation function. Garbage scoring → garbage tree. The critic is half the system.
Hallucination at the leaves. ToT can find consistent reasoning that's still factually wrong.
Real-world side effects. Branching is great in pure reasoning, dangerous when tools mutate state. Use ReAct + smart rollback if branches must act.

A pragmatic decision rule

If self-consistency on chain-of-thought already crushes your eval, stop there. If the problem genuinely has branching paths and verification is cheap, reach for ToT. If it has branching paths and every branch needs side-effects, redesign — branching with side-effects is a foot-gun.

In one line

Tree of Thoughts is search added to thinking — useful when the problem has many roads, and the wrong one is expensive to walk all the way down.

From the field

Tree of Thoughts is genuinely powerful and I almost never ship it, because the cost is brutal — you're exploring many branches, each one tokens you pay for, for a quality lift simpler tricks often match. In practice I climb a ladder: a sharper prompt, then chain-of-thought, then self-consistency (a few samples, take the majority), and only consider ToT for the rare problem with real search structure where backtracking actually matters, like some planning and puzzle tasks. For the day-to-day product work I do, a cheaper technique gets 90% of the benefit. Reach for the heavy machinery last, and only on evidence it's needed.

→ ¿Lo quieres en tu stack?

Custom Claude Code AI Agents & Workflows

Stop doing the repetitive, multi-step work that eats your team's day. This service delivers a working AI agent system that handles tasks like lead processing, data enrichment, content pipelines, and r...

Ver cómo puedo ayudar