Streaming Responses
Streaming sends tokens to the client as they are generated, dramatically improving perceived latency for long responses. Without streaming, the user waits for the full response before seeing anything.
Python Streaming
import anthropic
client = anthropic.Anthropic()
# Stream text directly to stdout
with client.messages.stream(
model="claude-sonnet-4-5",
max_tokens=1024,
messages=[{"role": "user", "content": "Write a haiku about async programming."}],
) as stream:
for text in stream.text_stream:
print(text, end="", flush=True)
print() # Final newline
# Access the final complete message after streaming
final_message = stream.get_final_message()
print(f"\nTokens used: {final_message.usage.input_tokens} in, {final_message.usage.output_tokens} out")
Low-Level Event Streaming
For more control over the event stream:
with client.messages.stream(
model="claude-sonnet-4-5",
max_tokens=2048,
messages=[{"role": "user", "content": "Explain quantum entanglement."}],
) as stream:
for event in stream:
if event.type == "content_block_start":
pass # Block started
elif event.type == "content_block_delta":
if event.delta.type == "text_delta":
print(event.delta.text, end="", flush=True)
elif event.type == "message_stop":
print("\n[Stream complete]")
JavaScript Streaming
import Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic();
async function streamResponse() {
const stream = await client.messages.stream({
model: "claude-sonnet-4-5",
max_tokens: 1024,
messages: [
{
role: "user",
content: "Write a short poem about TypeScript.",
},
],
});
// Stream text tokens
for await (const chunk of stream) {
if (
chunk.type === "content_block_delta" &&
chunk.delta.type === "text_delta"
) {
process.stdout.write(chunk.delta.text);
}
}
// Get final message with usage stats
const finalMessage = await stream.finalMessage();
console.log(`\n\nInput tokens: ${finalMessage.usage.input_tokens}`);
}
streamResponse();
When to Use Streaming
- Interactive chat interfaces — always stream
- Long document generation — stream so users see progress
- Background batch jobs — no need to stream
- API responses to other services — usually not needed
Streaming is a UI/UX decision as much as a technical one. Even a fast response feels slow if the user stares at a blank screen.