Babysitting PRs With Claude Code
March 26, 2026
I built a /babysit-pr command for Claude Code that watches a PR from push to merge. It runs CI checks, fetches code review, triages the feedback, applies fixes, pushes, and loops until the PR is clean. Here's how it works and why I set it up this way.
The Problem
My PR workflow had too many manual checkpoints. Push code, wait for CI, wait for Greptile review, read the comments, fix things, push again, wait again. Each step is five minutes of waiting followed by two minutes of work. I'd context-switch to something else and forget to come back.
What /babysit-pr Does
The command is a Claude Code skill — a markdown file that gives Claude a structured playbook to follow. The full thing is about 170 lines. Here's the gist:
Identify the PR via
gh pr viewPoll CI and Greptile in parallel — CI via
gh pr checks --watch, Greptile via MCP tool. Whichever finishes first gets processed while the other runsRun a local review with
codex review --base mainfor a second opinionTriage everything into Fix / Dismiss / Escalate. Not all review comments are valid — the skill explicitly says to assess quality and dismiss bad suggestions with reasoning
Apply fixes, validate with the linter, commit, push
Loop — trigger a Greptile re-review, wait for new CI, triage new feedback. Exit when CI is green and no unaddressed issues remain. Max 3 iterations
The key design choice: the agent assesses feedback rather than blindly fixing everything. Greptile sometimes flags false positives. Codex sometimes suggests unnecessary changes. The triage step forces the agent to make a judgment call for each finding and document why it dismissed something.
A Real Example
Today I pushed a fix for a tour publication sync bug. The /babysit-pr flow ran like this:
Pass 1:
CI: all green (build, lint, typecheck, tests)
Greptile: 5/5 confidence, one observation about webhook delivery being best-effort. Dismissed — operational concern, not a code bug
Codex: two findings, both valid:
Admin UI copy said unpublished tours "will still be accessible" — but my change made them 404. Copy needed updating
The webhook synced
isPublished=falseon DatoCMS unpublish, but didn't restoreisPublished=trueon republish. A real gap — an unpublish→republish cycle would leave the tour stuck
Both Codex findings got fixed, pushed, re-review triggered.
Pass 2:
CI: green
Greptile: still 5/5, no new issues
Auto-merge queued
Two iterations, two real bugs caught by the local review that Greptile missed. Total wall-clock time from push to merge-ready was about 10 minutes, most of it CI.
The Skill File
The interesting part is how little code this requires. It's a markdown file with a structured process. Claude follows it step by step. The triage table format forces structured output:
| Verdict | Criteria | Action |
|---------|----------|--------|
| Fix | Valid issue, clear fix, within scope | Apply fix |
| Dismiss | Invalid or based on misunderstanding | Skip with reason |
| Escalate| Needs human judgment | Flag for user |The rules are opinionated: Codex findings default to Fix. Greptile comments get assessed individually. CI failures always get fixed. Low-severity items get dismissed unless trivially fixable.
Running Multiple Review Sources
I run three review sources — CI, Greptile, and Codex — because they catch different things:
CI catches objective failures: type errors, lint violations, test regressions
Greptile understands the codebase context and catches architectural issues, but can be noisy
Codex does a thorough local review and is good at spotting logical gaps (like the missing republish sync)
No single source catches everything. The overlap is low enough that running all three is worth it. Today, Greptile approved the PR while Codex found two legitimate bugs. Yesterday it might be the reverse.