Skip to main content
AI Foundations Crawler graph 3 sliders

Tokens, Context Windows, and Why Long Prompts Cost More

Models do not see words — they see tokens. Drag the prompt and output sliders to watch tokens fill the context window and cost climb.

· 2 min read
Jump to the lab
▸ Try it yourself

Drag any slider — the diagram reacts in real time.

FR /100
¶ The analogy

The whiteboard analogy

Imagine a small whiteboard. You can fit a few sentences before you run out of room. Erase older ideas to make space, or buy a bigger board (it costs more).

A context window is the model's whiteboard. Tokens are the marker strokes — chunks of text, usually 3–4 characters each. Every prompt, every response, every system instruction has to fit on the board at once.

Run past the edge and the model forgets the start. Buy more board (a bigger context model) and you pay per stroke.

What a token actually is

A token is a sub-word unit produced by the model's tokenizer. Examples in English (roughly):

  • hello → 1 token
  • running → 1–2 tokens
  • unbelievability → 4–5 tokens
  • a Chinese character → often 2–3 tokens

Rule of thumb: 1 token ≈ 4 characters ≈ 0.75 words in English. Code and non-English text use more tokens per character.

The context window

The context window is the maximum tokens the model can attend to in one call. Modern values:

Model class Context window
Older small models 4k–8k tokens
Mid-tier production 32k–128k tokens
Long-context flagships 200k–1M+ tokens

Everything you send — system prompt, history, retrieved docs, user message — plus everything you receive must fit inside.

Why cost climbs faster than you expect

Most APIs price input tokens and output tokens separately, and output is usually 3–5× more expensive. Two traps:

  1. Conversation drift — every turn re-sends the entire history. A 50-turn chat at 500 tokens/turn ships 25k tokens every call.
  2. Verbose system prompts — a 2k-token instruction block runs on every single request.

Cache what's stable, summarise what's old, and ask for terser outputs.

Practical levers

  • Trim system prompts — every paragraph costs you on every call.
  • Prompt caching — providers reuse cached prefixes at a discount.
  • Output caps — set max_tokens so a runaway response can't blow your bill.
  • Streaming — does not save tokens, but lets you cut off early when the answer is good enough.
Engr Mejba Ahmed

Engr Mejba Ahmed

Claude Code Expert · Online

👋

Hey there!

Quick Actions

WhatsApp Instant reply

Chat on WhatsApp

+880 1723 741224 · Instant reply

Popular Questions

Engr Mejba Ahmed is connected
Engr Mejba Ahmed is typing...
Engr Mejba Ahmed avatar

✉ Want me to follow up? Drop your email

Engr Mejba Ahmed avatar

📞 Connect Directly

Choose how you'd like to reach me

WhatsApp

+880 1723 741224

Email

[email protected]

✓ Details sent! I'll get back to you shortly.

Powered by OpenAI

335+

Blog Posts

25

AI Courses

63

Projects

Services & Expertise

Pricing & Process

Learning & Resources

Connect & Support