Why flatten the repo before sending it to the model?

Flattening selects only relevant files, skips vendored and build output, and records paths so findings can cite real locations. Without it you waste context tokens on dependencies you do not own and lose the file-path mapping needed for file:line citations.

What happens when the repo is larger than the context window?

The chunking-strategy deliverable splits the repo across multiple calls and then merges the results, deduping overlapping findings. This keeps the analysis usable on repos that exceed a single context window at the cost of more calls.

Why is the output structured as JSON?

Returning file, line, finding, and severity as JSON makes the results machine-readable and easy to triage, rather than prose you have to parse by hand. A validation step rejects malformed JSON and retries with a repair instruction so the pipeline stays reliable.

Is this a replacement for manual code review?

No - it is designed as an automated first pass to surface candidates before a deeper manual review. Long-context findings still need a human to confirm, since the model can miss issues or flag false positives depending on the `[goal]` and repo.

Claude/ChatGPT Prompt to Build a Gemini Long-Context Codebase Analyzer | AI Prompt Library

What this prompt does

This prompt has the AI act as a senior AI tooling engineer that specifies a Gemini long-context codebase analyzer tightly enough to build, returning working code rather than pseudocode. You give it the [model], the [goal], and the [repo_size], and it returns the analyzer script plus a sample JSON findings array for that repo size.

The six deliverables make the analysis reproducible and citable: a repo flattener that selects relevant files, skips vendored and build output, and records file paths; a system prompt tuned to your [goal] demanding structured findings; a single-pass call sending the flattened repo to your [model] and returning JSON with file, line, finding, and severity; a chunking strategy for repos exceeding the context window with a merge step to dedupe findings; prompt caching so iterative follow-ups over the same repo stay cheap; and a validation step that rejects malformed JSON and retries with a repair instruction. The structure works because the win of long-context review is cited file:line findings in one shot, and the flattener plus JSON schema are what make that output usable.

When to use it

A repo is too big to skim manually but small enough to flatten into a long-context call.
You want cited file:line findings in structured JSON, not prose observations.
You need an automated first pass before a deeper manual code review.
Your repo exceeds the context window and you need chunking with deduped findings.
You ask iterative follow-up questions over the same repo and want caching to keep it cheap.

Example output

You get an analyzer script that flattens the repo (skipping vendored and build output), sends it to your [model], and parses the response, plus a sample JSON findings array where each entry carries a file, line, finding description, and severity - matching the analysis [goal] you set, such as security issues cited at file:line.

Pro tips

Cache the flattened-repo prompt; every follow-up question reuses it, and without caching the cost compounds fast.
Tune the system prompt to your [goal] precisely - "find security issues and cite file:line" yields sharper findings than a vague "review this code".
Make the flattener skip vendored and build output, or you will pay tokens to analyse dependencies you do not own.
Set [repo_size] honestly so the model knows whether a single pass fits or chunking is needed.
Treat this as a first pass, not a verdict; long-context review surfaces candidates that still need a human to confirm.
Keep the JSON validation-and-repair step; a malformed response should trigger a retry, not crash the pipeline.
Demand severity on every finding so you can triage by impact instead of reading an undifferentiated wall of results.
When chunking, make the merge step dedupe on file plus line plus finding, or the same issue spanning a chunk boundary shows up twice.

Details

Claude/ChatGPT Prompt to Build a Gemini Long-Context Codebase Analyzer

Fill in the placeholders

What this prompt does

When to use it

Example output

Pro tips

Frequently Asked Questions

Engr Mejba Ahmed

More in Gemini AI Prompts

Claude/ChatGPT Prompt to Build a Gemini Multi-Modal Document Q&A

Claude/ChatGPT Prompt to Build a Gemini Function-Calling Workflow Bot

Claude/ChatGPT Prompt to Extract Chart Data with Gemini Vision

Ready to Transform

Your Ideas?

Engr Mejba Ahmed

Hey there!