Home Concept Explainers AI Foundations Tokens, Context Windows, and Why Long Prompts Cost More

AI Foundations Crawler graph 3 sliders

Tokens, Context Windows, and Why Long Prompts Cost More

Models do not see words — they see tokens. Drag the prompt and output sliders to watch tokens fill the context window and cost climb.

Apr 29, 2026 · 2 min de lecture

Aller au lab Sans inscription · Gratuit pour toujours

▸ Essaie par toi-même

Glisse un slider — le diagramme réagit en direct.

Espace pour play · ←/→ pour scruber

Crawler graph

FR /100 SN-514

SPACE · ◄ ►

¶ L'analogie

The whiteboard analogy

Imagine a small whiteboard. You can fit a few sentences before you run out of room. Erase older ideas to make space, or buy a bigger board (it costs more).

A context window is the model's whiteboard. Tokens are the marker strokes — chunks of text, usually 3–4 characters each. Every prompt, every response, every system instruction has to fit on the board at once.

Run past the edge and the model forgets the start. Buy more board (a bigger context model) and you pay per stroke.

What a token actually is

A token is a sub-word unit produced by the model's tokenizer. Examples in English (roughly):

hello → 1 token
running → 1–2 tokens
unbelievability → 4–5 tokens
a Chinese character → often 2–3 tokens

Rule of thumb: 1 token ≈ 4 characters ≈ 0.75 words in English. Code and non-English text use more tokens per character.

The context window

The context window is the maximum tokens the model can attend to in one call. Modern values:

Model class	Context window
Older small models	4k–8k tokens
Mid-tier production	32k–128k tokens
Long-context flagships	200k–1M+ tokens

Everything you send — system prompt, history, retrieved docs, user message — plus everything you receive must fit inside.

Why cost climbs faster than you expect

Most APIs price input tokens and output tokens separately, and output is usually 3–5× more expensive. Two traps:

Conversation drift — every turn re-sends the entire history. A 50-turn chat at 500 tokens/turn ships 25k tokens every call.
Verbose system prompts — a 2k-token instruction block runs on every single request.

Cache what's stable, summarise what's old, and ask for terser outputs.

Practical levers

Trim system prompts — every paragraph costs you on every call.
Prompt caching — providers reuse cached prefixes at a discount.
Output caps — set max_tokens so a runaway response can't blow your bill.
Streaming — does not save tokens, but lets you cut off early when the answer is good enough.