Will this prompt actually refactor my code, or just describe the changes?

It produces analysis and before/after code examples for the top areas you asked for — it does not silently rewrite your files. That's intentional: you review each proposed change and its risk level, then apply them one at a time. In Claude Code you can follow up with "apply area 2" once you've vetted it, keeping you in control of what lands.

How does it 'preserve all existing tests and ensure backward compatibility'?

The prompt instructs the model to keep public signatures stable and avoid changes that would break callers or your test suite, which is why each area carries a risk rating. But it can only reason about tests it can see — if your coverage is thin, treat the backward-compatibility claim as a hypothesis and run your suite after every applied change to confirm it.

What should I put in [principles] and [focus_area]?

Use [principles] for the design rules to refactor toward (e.g. SOLID, DRY, dependency injection, immutability) and [focus_area] for the outcome to prioritize (testability, performance, readability, reduced coupling). [focus_area] decides the ranking order, so make it specific and measurable — "reduce coupling" sorts results very differently from a vague "better code."

Claude/ChatGPT (or Cursor) Prompt to Refactor a Full Codebase | AI Prompt Library

What this prompt does

This prompt turns Claude Code into a refactoring analyst that audits one [language] codebase inside a specific [directory], then surfaces the top [count] areas of technical debt instead of dumping a vague "your code could be better." Scoping to a directory and capping the count is deliberate: it keeps the model from boiling the ocean and forces it to triage, so you get the highest-leverage problems first rather than a hundred nitpicks.

The structure is where the real work happens. For every area, the prompt demands four things — the current problem, a refactored solution that follows your chosen [principles] (SOLID, DRY, dependency injection, whatever you name), concrete before/after code, and a low/medium/high risk estimate. That four-part shape is what makes the output reviewable: you can read the "before," confirm it matches your reality, and judge the "after" against named principles instead of taste. The closing constraints — preserve all existing tests and keep backward compatibility — anchor the model to changes you can actually merge, not a rewrite that breaks every caller.

When to use it

You inherited a legacy module and need a prioritized debt list before quoting a refactor.
A directory has grown into a god-class or tangled service and you want SOLID-based splits proposed with code.
You're modernizing old patterns (callbacks to async/await, manual DI, deprecated APIs) and want safe, incremental diffs.
Pre-refactor planning: you need risk levels to sequence work and decide what ships this sprint vs. later.
Onboarding to an unfamiliar codebase and want the model to point at the worst-smelling areas first.
Justifying tech-debt work to a lead who wants before/after evidence, not adjectives.

Example output

AREA 2 of 5 — OrderService god-class (src/orders/OrderService.py)
Problem: 600-line class handling pricing, tax, persistence, and email.
Violates SRP; untestable without a live DB.
Refactor (SOLID): extract PricingCalculator, TaxResolver, OrderRepository.

# Before
class OrderService:
    def place(self, cart):
        total = sum(i.price for i in cart) * 1.2  # tax inline
        db.save(...)  # persistence inline

# After
class OrderService:
    def __init__(self, pricing, repo): ...
    def place(self, cart):
        total = self.pricing.total(cart)
        self.repo.save(order)

Risk: MEDIUM — public place() signature unchanged; tests pass as-is.

Pro tips

Set [principles] precisely. "SOLID" is broad — narrow it to "SRP and dependency inversion" when a class does too much, so the model proposes the right surgery instead of scattering interfaces everywhere.
Match [count] to your review appetite. Start at 3–5 for a first pass; asking for 20 produces shallow entries you won't action. Re-run on the next directory rather than inflating the number.
Make [focus_area] measurable — "testability" or "reduce coupling" steers it far better than "code quality." It directly drives the prioritization order.
The "preserve all tests" clause assumes you have tests. If coverage is thin, pair this with a test-generation prompt first, or the backward-compatibility claim is unverified.
Treat every "after" as a proposal, not a patch. Apply one area, run the suite, commit, then move to the next — the risk levels exist so you can sequence, not skip review.

Details

Claude/ChatGPT (or Cursor) Prompt to Refactor a Full Codebase

Fill in the placeholders

What this prompt does

When to use it

Example output

Pro tips

Frequently Asked Questions

Engr Mejba Ahmed

More in Claude Code Prompts

Claude Code Prompt to Ship a Production PR from Ticket to Merge

Claude Code Prompt for a Safe Large-Scale Rename Across a Repo

Claude Code Prompt to Debug a Flaky Test with a Reliable Repro

Ready to Transform

Your Ideas?

Engr Mejba Ahmed

Hey there!