What this prompt does
This prompt frames the AI as a senior DevOps engineer and forces it to debug a failing pipeline in a fixed order rather than handing you a checklist of maybes. You paste the failing log into [ci_log], name your [ci_provider] and [stack], and the model works through five steps: pinpoint the exact failing step and the first error that matters, categorise the failure, propose a minimal fix, add a prevention guard, and suggest one hardening change. The output is the root cause, the fix as a diff or commands, and a regression test.
The structure works because step two does the heavy lifting: categorising the failure as infrastructure, code, or flaky. That split stops you fixing the wrong layer, which is where teams routinely lose an hour patching a test when the runner was actually starved of cache. The variables keep the analysis grounded: [ci_provider] shapes whether the fix talks about runners, caches, or workflow syntax specific to that system, and [stack] tells the model whether a failure is a build, a test, or a container issue. Feeding the real [ci_log] is what lets it ignore downstream noise and find the first error that actually broke the run, since CI output often buries the true cause under a cascade of secondary failures.
When to use it
- A red pipeline is blocking the whole team and you need triage in minutes.
- You can't tell whether the failure is your code or the runner.
- A test passes locally but fails only in CI.
- You suspect a flaky test but want it confirmed before adding retries.
- You want a regression guard so the same class of failure is caught next time.
- You inherited a pipeline and need a fast read on what just broke.
Example output
You get a short structured report: the pinpointed failing step, a one-word category (infra, code, or flaky) with reasoning, the minimal fix written as a diff or shell commands, a regression test to add, and one hardening suggestion such as a cache key fix or scoped retry. It reads as a focused incident note you can act on immediately, walking from the first real error to a concrete change, not a generic CI tutorial listing things you might check.
Pro tips
- Paste the full
[ci_log], including the lines before the first error; truncated logs hide the real cause. - Name
[ci_provider]exactly so the fix uses that system's real syntax for caches and runners. - Put the deploy target in
[stack]so infra-versus-code categorisation is accurate. - Trust the step-two split before touching anything; fixing the wrong layer wastes an hour.
- Only apply the retry suggestion to steps the model labels genuinely flaky, never to real failures.
- If the log is huge, paste the failing job's section rather than the entire workflow output.