The customs-form analogy
At customs, you do not write a paragraph: you fill in a form. Name. Date of birth. Goods. Yes/No on agriculture. The agent processes a thousand travellers a day because the form makes their job mechanical.
A free-text LLM response is the paragraph. Structured output is the customs form. Your downstream code stops parsing and starts processing — the schema removes the guesswork on both sides.
What "structured output" actually means
You hand the model a schema (JSON Schema, Pydantic, Zod, etc.) and the API guarantees the response will validate against it. Modes you'll see:
- JSON mode — output is some valid JSON, no schema enforced. Useful but weak.
- Constrained / structured outputs — output validates against a specific schema. Token-level constraints during decoding ensure it. This is what you want.
- Function calling output — same idea applied to tool arguments.
The implementation under the hood is constrained decoding: at every step, the decoder masks tokens that would break the schema's grammar. The model literally cannot produce invalid JSON.
What you stop doing
- Regex parsing model output (a known-bad idea that quietly costs hours of debugging).
- Asking "please respond in JSON" and hoping (the model usually obeys, occasionally adds prose, randomly trails a comma).
- Retrying on parse failures with prompt scolding ("you forgot the closing brace") — slow and costs money.
- Trying to fix malformed JSON with
jsonrepair(works most of the time, fails when you most need it).
What you start doing
- Define a schema close to the data shape your app needs.
- Validate on the way out (defense in depth, even if the API guarantees it).
- Use discriminated unions for branching outputs ("either an
answeror aclarifying_question"). - Stream and parse incrementally if latency matters (most APIs support partial-JSON streaming).
Schema design tips
- Be specific.
status: "approved" | "rejected" | "pending"beatsstatus: string. - Bound numbers.
score: number, minimum: 0, maximum: 1— saves bug reports. - Keep it shallow. Three levels of nesting beats nine. The model is more reliable when the schema is digestible.
- Use descriptions. Each field should have a one-sentence description; the model reads them.
- Include enums for closed sets. Free-text where enums would do is a future bug.
Common pitfalls
- Schemas that are too rigid. Real-world data varies; reject paths the model has no graceful answer for. Provide an
unknownenum value or an optionalnotesfield for the messy edge. - Free-text inside structured output. A
description: stringfield still hallucinates. Constrain what you can; eval what you can't. - Confusing function calling with structured outputs. They share the schema mechanism; the use case is different.
- Massive schemas. A 1000-field schema is a context burner and a model confuser. Split the call.
Where structured outputs shine
- Form filling, data extraction. Resume → JSON. Receipt → JSON. PDF → JSON.
- Classification with rationale.
{label: "spam", confidence: 0.92, reason: "..." }. - Multi-output decisions. Score, category, suggested action — all atomic.
- Pipelines. Output of one model is input to the next; structure prevents drift.
When not to use them
- Open-ended creative writing. A poem in a
textfield defeats the point. - Conversational chat where flow matters. Forcing structure kills feel.
- When schema brittleness will hurt more than free-text errors will — rare, but real.
In one line
Free-text output is a demo. Structured output is a product. The schema is the API.