The framework-vs-glue analogy
You can wire an agent yourself — call the API, parse a tool call, run it, append the result, call again. Doable. But you will rebuild the same plumbing every project: retries, file edits with diffs, sub-agent spawning, permission gates, context compaction. The Claude Agent SDK is that plumbing, packaged. You bring tools and intent; the SDK runs the loop.
What the SDK gives you
- The agent loop — observe → decide → act → repeat, with sensible defaults for stop conditions and max iterations.
- Tools, batteries-included — file read/edit/write, bash, web fetch, custom tools you register.
- Sub-agents — spawn isolated child agents with their own context for parallel or specialised work.
- Permission modes — auto-allow, ask, or deny per tool. Maps to real production safety needs.
- Context management — automatic compaction when conversations grow past the window.
- Hooks — run shell commands or your own code before/after tool calls (audit logs, guardrails, custom telemetry).
When to use it (vs raw API)
Reach for the SDK when you want:
- A real command-line agent that operates on a repo or filesystem.
- Multi-step work with branching, retries, and human-in-the-loop checkpoints.
- A shared toolset across many agents in your org.
- MCP-server interop — the SDK speaks MCP natively, so any MCP tool drops in.
Stick to raw API calls when you just need a single LLM completion or a simple chat — the SDK is overkill there.
What it does not do
- Replace your evals — you still need a test suite for behaviour.
- Make tools safe by default — you still gate destructive actions.
- Free you from prompt engineering — system prompts and tool descriptions still matter, a lot.
Practical wiring
A typical Agent SDK app looks like this:
- Define your tools (or import an MCP server).
- Write a system prompt with the agent's role, goal, and constraints.
- Set permission mode (often
acceptEditsfor trusted environments,defaultfor prod with confirmation gates). - Run with a clear stopping condition (max turns, "done" signal, or human approval).
The SDK ships with TypeScript and Python flavours. Same loop, idiomatic in each language.
The right mental model
Treat the SDK like a runtime for agents the same way Express is a runtime for HTTP — it handles the hot path so you focus on routes (your tools and prompts). What you ship is still your problem; the loop is no longer.