Home Concept Explainers Reasoning Patterns Chain-of-Thought Prompting: Get LLMs to Show Their Work

Reasoning Patterns Agent loop 3 sliders

Chain-of-Thought Prompting: Get LLMs to Show Their Work

Add "think step by step" and accuracy on multi-step problems jumps. Hide the scratchpad in production. Free quality, almost always.

Apr 29, 2026 · 3 min de lecture

Aller au lab Sans inscription · Gratuit pour toujours

▸ Essaie par toi-même

Glisse un slider — le diagramme réagit en direct.

Espace pour play · ←/→ pour scruber

Agent loop

FR /100 SN-74A

SPACE · ◄ ►

¶ L'analogie

The maths-test analogy

You ask a student "what's 47 × 38?" Most stare and guess. Ask the same student "show your work" and they multiply digits, carry, add — and produce the correct answer. The reasoning was always there. Asking for it changed the output.

Chain-of-thought (CoT) prompting is the same trick for LLMs. Ask the model to think through the problem in plain language before giving the final answer, and accuracy on multi-step tasks (math, logic, planning) jumps measurably. The reasoning costs tokens, but it pays in correctness.

The minimal recipe

Add a short instruction:

Think step by step before answering.

Or for stronger results:

Before answering, walk through your reasoning. Then provide the final answer in <answer>...</answer> tags.

The first costs nothing structural. The second lets you parse the answer cleanly while still benefiting from the reasoning.

Why it works

LLMs are autoregressive — each token conditions on the ones before it. When the model commits to a final answer too quickly, it's locked in. Letting it produce intermediate reasoning tokens means the final-answer token is conditioned on a richer context the model just generated. The math gets done in tokens, not in a hidden mental step.

Variants worth knowing

Zero-shot CoT — just the "think step by step" instruction. Cheap, often enough.
Few-shot CoT — show 1–3 worked examples. Better for unusual problem shapes.
Self-consistency — generate N CoT samples, take the majority answer. Big quality lift on math/logic; high cost (covered in its own explainer).
Tree of thoughts — branch the reasoning, search across paths. For genuinely multi-path problems.

Where CoT helps most

Multi-step math. Word problems, accounting, unit conversions.
Logic puzzles with nested conditions.
Planning — break a goal into ordered steps.
Diagnostic reasoning — code debugging, root-cause analysis.
Anything with constraints to track — schedules, allocations, eligibility rules.

Where it doesn't help (or hurts)

Simple lookups — "what's the capital of France?" CoT just adds tokens.
Creative writing — explicit reasoning can flatten voice; let the model write.
Pattern-recognition tasks the model has memorised — CoT introduces noise.
Latency-critical paths — the reasoning costs real wall-time.

Hide the scratchpad in production

Users do not want to read 800 tokens of reasoning before the answer. Two clean patterns:

Tagged reasoning — <thinking>...</thinking><answer>...</answer>. Render only the answer; optionally expose the thinking on a "show reasoning" toggle.
Two-call flow — first call: ask for a plan / reasoning. Second call: ask for the final answer using the plan. Lets you cap reasoning length and parallelise.

Reasoning models change the equation

Newer "reasoning models" (often labelled with a thinking step) bake long deliberation into the model itself, with internal tokens you may not even pay for. With those, explicit CoT prompting is often redundant — the model is already doing it. Always test on your task; sometimes simpler prompts beat over-instructed ones on a reasoning model.

Anti-patterns to avoid

"Think very carefully" — magic phrases that move the needle 0%.
Long reasoning when the task is trivial — pay for tokens you don't need.
Letting the user see raw reasoning by default — looks unprofessional, and exposes prompt internals.
Trusting the reasoning text as ground truth — the model can produce convincing wrong reasoning. Verify the answer.

In one line

Chain-of-thought is the cheapest reliability lever for hard prompts — turn it on, hide the scratchpad, ship.

From the field

The thing that's changed about CoT is that with reasoning models you often shouldn't add it manually — they think internally, and bolting "let's think step by step" on top can hurt more than help. On standard models it still earns its keep for genuinely multi-step problems, but I always hide the scratchpad from users (ask for thinking in one section, render only the answer) and stay mindful that I'm paying output tokens for reasoning I then discard. My decision rule: reach for explicit CoT only when the task has real steps and the model gets them wrong without it — not as a reflex stapled onto every prompt.

→ Vous le voulez dans votre stack ?

Custom Claude Code AI Agents & Workflows

Stop doing the repetitive, multi-step work that eats your team's day. This service delivers a working AI agent system that handles tasks like lead processing, data enrichment, content pipelines, and r...

Voir comment je peux aider