Babysitting PRs With Claude Code

March 26, 2026

I built a /babysit-pr command for Claude Code that watches a PR from push to merge. It runs CI checks, fetches code review, triages the feedback, applies fixes, pushes, and loops until the PR is clean. Here's how it works and why I set it up this way.

The Problem

My PR workflow had too many manual checkpoints. Push code, wait for CI, wait for Greptile review, read the comments, fix things, push again, wait again. Each step is five minutes of waiting followed by two minutes of work. I'd context-switch to something else and forget to come back.

What /babysit-pr Does

The command is a Claude Code skill — a markdown file that gives Claude a structured playbook to follow. The full thing is about 170 lines. Here's the gist:

  1. Identify the PR via gh pr view

  2. Poll CI and Greptile in parallel — CI via gh pr checks --watch, Greptile via MCP tool. Whichever finishes first gets processed while the other runs

  3. Run a local review with codex review --base main for a second opinion

  4. Triage everything into Fix / Dismiss / Escalate. Not all review comments are valid — the skill explicitly says to assess quality and dismiss bad suggestions with reasoning

  5. Apply fixes, validate with the linter, commit, push

  6. Loop — trigger a Greptile re-review, wait for new CI, triage new feedback. Exit when CI is green and no unaddressed issues remain. Max 3 iterations

The key design choice: the agent assesses feedback rather than blindly fixing everything. Greptile sometimes flags false positives. Codex sometimes suggests unnecessary changes. The triage step forces the agent to make a judgment call for each finding and document why it dismissed something.

A Real Example

Today I pushed a fix for a tour publication sync bug. The /babysit-pr flow ran like this:

Pass 1:

  • CI: all green (build, lint, typecheck, tests)

  • Greptile: 5/5 confidence, one observation about webhook delivery being best-effort. Dismissed — operational concern, not a code bug

  • Codex: two findings, both valid:

    • Admin UI copy said unpublished tours "will still be accessible" — but my change made them 404. Copy needed updating

    • The webhook synced isPublished=false on DatoCMS unpublish, but didn't restore isPublished=true on republish. A real gap — an unpublish→republish cycle would leave the tour stuck

Both Codex findings got fixed, pushed, re-review triggered.

Pass 2:

  • CI: green

  • Greptile: still 5/5, no new issues

  • Auto-merge queued

Two iterations, two real bugs caught by the local review that Greptile missed. Total wall-clock time from push to merge-ready was about 10 minutes, most of it CI.

The Skill File

The interesting part is how little code this requires. It's a markdown file with a structured process. Claude follows it step by step. The triage table format forces structured output:

| Verdict | Criteria | Action |
|---------|----------|--------|
| Fix     | Valid issue, clear fix, within scope | Apply fix |
| Dismiss | Invalid or based on misunderstanding | Skip with reason |
| Escalate| Needs human judgment | Flag for user |

The rules are opinionated: Codex findings default to Fix. Greptile comments get assessed individually. CI failures always get fixed. Low-severity items get dismissed unless trivially fixable.

Running Multiple Review Sources

I run three review sources — CI, Greptile, and Codex — because they catch different things:

  • CI catches objective failures: type errors, lint violations, test regressions

  • Greptile understands the codebase context and catches architectural issues, but can be noisy

  • Codex does a thorough local review and is good at spotting logical gaps (like the missing republish sync)

No single source catches everything. The overlap is low enough that running all three is worth it. Today, Greptile approved the PR while Codex found two legitimate bugs. Yesterday it might be the reverse.

Comments 0

No comments yet. Be the first to comment!