Streaming Responses

Streaming Responses

16 min read Lesson 7 / 28

Streaming sends tokens to the client as they are generated, dramatically improving perceived latency for long responses. Without streaming, the user waits for the full response before seeing anything.

Python Streaming

import anthropic

client = anthropic.Anthropic()

# Stream text directly to stdout
with client.messages.stream(
    model="claude-sonnet-4-5",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Write a haiku about async programming."}],
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)

print()  # Final newline

# Access the final complete message after streaming
final_message = stream.get_final_message()
print(f"\nTokens used: {final_message.usage.input_tokens} in, {final_message.usage.output_tokens} out")

Low-Level Event Streaming

For more control over the event stream:

with client.messages.stream(
    model="claude-sonnet-4-5",
    max_tokens=2048,
    messages=[{"role": "user", "content": "Explain quantum entanglement."}],
) as stream:
    for event in stream:
        if event.type == "content_block_start":
            pass  # Block started
        elif event.type == "content_block_delta":
            if event.delta.type == "text_delta":
                print(event.delta.text, end="", flush=True)
        elif event.type == "message_stop":
            print("\n[Stream complete]")

JavaScript Streaming

import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic();

async function streamResponse() {
  const stream = await client.messages.stream({
    model: "claude-sonnet-4-5",
    max_tokens: 1024,
    messages: [
      {
        role: "user",
        content: "Write a short poem about TypeScript.",
      },
    ],
  });

  // Stream text tokens
  for await (const chunk of stream) {
    if (
      chunk.type === "content_block_delta" &&
      chunk.delta.type === "text_delta"
    ) {
      process.stdout.write(chunk.delta.text);
    }
  }

  // Get final message with usage stats
  const finalMessage = await stream.finalMessage();
  console.log(`\n\nInput tokens: ${finalMessage.usage.input_tokens}`);
}

streamResponse();

When to Use Streaming

Interactive chat interfaces — always stream
Long document generation — stream so users see progress
Background batch jobs — no need to stream
API responses to other services — usually not needed

Streaming is a UI/UX decision as much as a technical one. Even a fast response feels slow if the user stares at a blank screen.