Skip to main content

Claude/ChatGPT Prompt to Build a Gemini Long-Context Codebase Analyzer

Use Gemini's long context to analyse an entire codebase in one pass, returning structured JSON findings with chunking and caching.

Fill in the placeholders

Edit the values, then copy your finished prompt.

Your Prompt
prompt.txt

                                

What this prompt does

This prompt has the AI act as a senior AI tooling engineer that specifies a Gemini long-context codebase analyzer tightly enough to build, returning working code rather than pseudocode. You give it the [model], the [goal], and the [repo_size], and it returns the analyzer script plus a sample JSON findings array for that repo size.

The six deliverables make the analysis reproducible and citable: a repo flattener that selects relevant files, skips vendored and build output, and records file paths; a system prompt tuned to your [goal] demanding structured findings; a single-pass call sending the flattened repo to your [model] and returning JSON with file, line, finding, and severity; a chunking strategy for repos exceeding the context window with a merge step to dedupe findings; prompt caching so iterative follow-ups over the same repo stay cheap; and a validation step that rejects malformed JSON and retries with a repair instruction. The structure works because the win of long-context review is cited file:line findings in one shot, and the flattener plus JSON schema are what make that output usable.

When to use it

  • A repo is too big to skim manually but small enough to flatten into a long-context call.
  • You want cited file:line findings in structured JSON, not prose observations.
  • You need an automated first pass before a deeper manual code review.
  • Your repo exceeds the context window and you need chunking with deduped findings.
  • You ask iterative follow-up questions over the same repo and want caching to keep it cheap.

Example output

You get an analyzer script that flattens the repo (skipping vendored and build output), sends it to your [model], and parses the response, plus a sample JSON findings array where each entry carries a file, line, finding description, and severity - matching the analysis [goal] you set, such as security issues cited at file:line.

Pro tips

  • Cache the flattened-repo prompt; every follow-up question reuses it, and without caching the cost compounds fast.
  • Tune the system prompt to your [goal] precisely - "find security issues and cite file:line" yields sharper findings than a vague "review this code".
  • Make the flattener skip vendored and build output, or you will pay tokens to analyse dependencies you do not own.
  • Set [repo_size] honestly so the model knows whether a single pass fits or chunking is needed.
  • Treat this as a first pass, not a verdict; long-context review surfaces candidates that still need a human to confirm.
  • Keep the JSON validation-and-repair step; a malformed response should trigger a retry, not crash the pipeline.
  • Demand severity on every finding so you can triage by impact instead of reading an undifferentiated wall of results.
  • When chunking, make the merge step dedupe on file plus line plus finding, or the same issue spanning a chunk boundary shows up twice.

Frequently Asked Questions

Why flatten the repo before sending it to the model?
Flattening selects only relevant files, skips vendored and build output, and records paths so findings can cite real locations. Without it you waste context tokens on dependencies you do not own and lose the file-path mapping needed for file:line citations.
What happens when the repo is larger than the context window?
The chunking-strategy deliverable splits the repo across multiple calls and then merges the results, deduping overlapping findings. This keeps the analysis usable on repos that exceed a single context window at the cost of more calls.
Why is the output structured as JSON?
Returning file, line, finding, and severity as JSON makes the results machine-readable and easy to triage, rather than prose you have to parse by hand. A validation step rejects malformed JSON and retries with a repair instruction so the pipeline stays reliable.
Is this a replacement for manual code review?
No - it is designed as an automated first pass to surface candidates before a deeper manual review. Long-context findings still need a human to confirm, since the model can miss issues or flag false positives depending on the `[goal]` and repo.
Engr Mejba Ahmed

Need this built for real?

Engr Mejba Ahmed

AI Developer · Software Engineer

I'm Mejba — I design and ship production AI systems, automations, and full-stack apps. If you want this turned into a working solution for your team, let's talk.

More in Gemini AI Prompts

Engr Mejba Ahmed

Engr Mejba Ahmed

Claude Code Expert · Online

👋

Hey there!

Quick Actions

WhatsApp Instant reply

Chat on WhatsApp

+880 1723 741224 · Instant reply

Popular Questions

Engr Mejba Ahmed is connected
Engr Mejba Ahmed is typing...
Engr Mejba Ahmed avatar

✉ Want me to follow up? Drop your email

Engr Mejba Ahmed avatar

📞 Connect Directly

Choose how you'd like to reach me

WhatsApp

+880 1723 741224

Email

[email protected]

✓ Details sent! I'll get back to you shortly.

Powered by OpenAI

335+

Blog Posts

25

AI Courses

63

Projects

Services & Expertise

Pricing & Process

Learning & Resources

Connect & Support