The intern-with-keys analogy
A chatbot is the intern who answers your question across the desk: smart, fast, but only words.
An AI agent is the intern you handed the office keys, the company credit card, and a checklist. They can open doors, run errands, file reports, and come back when the work is actually done — not just when an answer is composed.
The leap from chatbot to agent is the leap from talking about the world to changing the world.
The minimum bar for "agent"
A system is an agent when it can:
- Use tools — call functions, hit APIs, run code, query databases.
- Loop — observe the tool's output, decide the next action, repeat.
- Stop — recognise when the goal is met and return.
Take any of those away and you are back at chatbot territory.
The agent loop, explicitly
goal → plan → act (call a tool) → observe (read the result) → decide
↑ │
└────── if not done, loop ─────────────────┘
The model does the plan, act selection, and decide. The runtime does the actual tool calls and feeds results back.
Tools are the leverage
A model with no tools tops out at "things it memorised during training." A model with the right tools can:
- Read live data (database, search, file system).
- Write to the world (send an email, open a PR, ship a deploy).
- Run code (execute, test, iterate on real outputs).
- Call other agents (delegation).
Quality of the tool set predicts quality of the agent more than choice of model.
Where agents fall apart
- Tool sprawl — 50 tools on one agent and the model gets confused about which to use. Group, scope, or split.
- No max-step cap — a stuck agent can spin forever. Always set a hard ceiling.
- Hidden side effects — tools that mutate prod (delete, send, charge) need confirmation gates, not just trust.
- Lossy observation — pasting a 10MB tool output into the next prompt blows the window. Summarise, paginate, or store-and-reference.
Practical autonomy levels
| Level | What the agent does | Where to use it |
|---|---|---|
| 0 | Suggests an action, human runs it | Sensitive prod work |
| 1 | Runs read-only tools alone | Research, summarisation |
| 2 | Runs reversible writes (drafts, branches) | Internal tooling |
| 3 | Full autonomy with audit trail | Background jobs, data pipelines |
Start at 1. Earn the way to 3 with logs and evals.