What this prompt does
This prompt has the AI act as a senior AI tooling engineer that specifies a Gemini long-context codebase analyzer tightly enough to build, returning working code rather than pseudocode. You give it the [model], the [goal], and the [repo_size], and it returns the analyzer script plus a sample JSON findings array for that repo size.
The six deliverables make the analysis reproducible and citable: a repo flattener that selects relevant files, skips vendored and build output, and records file paths; a system prompt tuned to your [goal] demanding structured findings; a single-pass call sending the flattened repo to your [model] and returning JSON with file, line, finding, and severity; a chunking strategy for repos exceeding the context window with a merge step to dedupe findings; prompt caching so iterative follow-ups over the same repo stay cheap; and a validation step that rejects malformed JSON and retries with a repair instruction. The structure works because the win of long-context review is cited file:line findings in one shot, and the flattener plus JSON schema are what make that output usable.
When to use it
- A repo is too big to skim manually but small enough to flatten into a long-context call.
- You want cited file:line findings in structured JSON, not prose observations.
- You need an automated first pass before a deeper manual code review.
- Your repo exceeds the context window and you need chunking with deduped findings.
- You ask iterative follow-up questions over the same repo and want caching to keep it cheap.
Example output
You get an analyzer script that flattens the repo (skipping vendored and build output), sends it to your [model], and parses the response, plus a sample JSON findings array where each entry carries a file, line, finding description, and severity - matching the analysis [goal] you set, such as security issues cited at file:line.
Pro tips
- Cache the flattened-repo prompt; every follow-up question reuses it, and without caching the cost compounds fast.
- Tune the system prompt to your
[goal]precisely - "find security issues and cite file:line" yields sharper findings than a vague "review this code". - Make the flattener skip vendored and build output, or you will pay tokens to analyse dependencies you do not own.
- Set
[repo_size]honestly so the model knows whether a single pass fits or chunking is needed. - Treat this as a first pass, not a verdict; long-context review surfaces candidates that still need a human to confirm.
- Keep the JSON validation-and-repair step; a malformed response should trigger a retry, not crash the pipeline.
- Demand severity on every finding so you can triage by impact instead of reading an undifferentiated wall of results.
- When chunking, make the merge step dedupe on file plus line plus finding, or the same issue spanning a chunk boundary shows up twice.