Two zoom levels, one interview
A senior AI system design interview almost always operates at two zoom levels. High-level design (HLD) answers what are the components and how do they talk to each other? Low-level design (LLD) answers how is one critical component built — its data model, its API, its hot path? You will be expected to switch between them on demand.
What HLD looks like for AI systems
For an LLM-backed product, an HLD usually shows:
- Edge — clients, gateway, auth, rate limiting, abuse detection.
- Orchestration — prompt assembly, tool selection, retries, fallbacks.
- Knowledge — vector store, keyword index, structured DB, freshness pipeline.
- Models — primary LLM, smaller routing model, embedding model, re-ranker, content filter.
- Telemetry — eval harness, traces, cost meter, feedback store.
Drawing this in five clean boxes plus arrows is half the battle.
What LLD looks like for AI systems
LLD is where most candidates lose marks. Be ready to specify:
- Schemas — chunk, embedding, message, run, span, eval.
- Latency budgets — for example retrieve 80 ms + rerank 40 ms + first token 200 ms.
- Failure modes — model timeout, partial tool failure, hallucination caught in eval.
- Hotspots — KV-cache reuse, prompt-cache hits, batch packing.
A practical rule
Open with HLD until your interviewer challenges a component. Then drop into LLD on that component, finish, and pop back up. The next lesson turns this into a repeatable interview script.