How I Automated Codex Code Reviews for Both Claude Code and Codex

May 12, 2026 · 31 min read

Why I built this

I use both Claude Code and Codex every day. Most of my coding is done by these agents now. The hard part is not making them write code. The hard part is trusting what they ship.

So I wanted a setup where the agent does not just write the code. After it finishes, it also calls a separate reviewer to check the work, and it keeps fixing issues until the reviewer is happy. And I wanted this to happen on its own, without me sitting in front of the computer.

The Codex reviewer is very strong. It is, at this point, better than me at finding bugs in code. It catches edge cases, missing error handling, wrong state transitions, and small mistakes that I would never see in a normal pull request review. So I gave the agents a way to call this reviewer themselves and made them run it again and again until the review comes back clean.

Now I can give the agent a task, walk away, and come back to code that the agent has already pushed through the Codex reviewers as many times as it needed. Sometimes that is 3 reviews. Sometimes it is 20 or more. The number does not matter — the agent keeps looping until the reviews come back clean. This is a big change. It lets the agent work for hours as long as I give it a clear spec and a clear architecture.

This post is about that setup. There are two parts:

  1. The Codex side — when I am working in Codex (the CLI or the Codex app), Codex itself runs the reviewer loop before every commit.
  2. The Claude Code side — when I am inside Claude Code, Claude calls the same Codex reviewer through a plugin.

Both sides do the same thing: run three Codex reviews in parallel, look at the results, fix what is real, then run three new reviews. They stop only when the reviews are clean.

The three pieces you need

The whole thing has three small pieces:

  1. A global instruction file that tells the agent the review is mandatory and how to run it.
  2. A wrapper script that launches three parallel codex review runs and saves their outputs.
  3. A small helper script that extracts only the final review answer from each raw output file.

For Claude Code you also need a fourth thing:

  1. The Codex plugin for Claude Code, plus a Skill that tells Claude how to run three reviews in parallel and read the results.

Let me go through each part.

Part 1 — The Codex loop

This is the simpler side. Every time I start a Codex session (CLI or the Codex app), it reads my global ~/.codex/AGENTS.md file. In that file I have a section called "Codex Review (BEFORE committing) — MANDATORY". It tells Codex what to do before any commit.

Step 1 of 3 — Add the review section to ~/.codex/AGENTS.md

What to do: open the file ~/.codex/AGENTS.md (create it if it does not exist) and append the section below. Codex reads this file at the start of every session in any repo, so you only have to set it up once for your whole machine.

Exact content to paste:

Show the full AGENTS.md review section (~50 lines)
## Codex Review (BEFORE committing) — MANDATORY

After completing each task (all tests passing, implementation done),
but **before any commit**:

1. Launch 3 independent native Codex reviews of the current uncommitted changes:

codex review --uncommitted

Do not pass a prompt or extra instructions. The native command already
knows how to review the uncommitted working tree.

Prefer the local wrapper when running the required three-review round:

~/bin/codex-review-run --scope uncommitted --triple

The wrapper only automates three independent `codex review --uncommitted`
runs and stores the outputs under `~/.codex/review-runs/<timestamp>/`.

2. Collect all 3 final reviewer outputs from `~/.codex/review-runs/<timestamp>/`.
Use the helper script to print only the final Codex output block:

python3 scripts/reviews/extract-final-codex-review.py \
~/.codex/review-runs/<timestamp>/review-*.raw.txt

3. Validate each review is real:
- A clean review must explicitly say there are no actionable in-scope findings.
- A review with findings usually includes `Review comment:` or
`Full review comments:` plus prioritized findings.
- If the output is just a status stub or malformed, re-run it.

4. Every review round is a FULL FRESH review — never scope Codex to
"just check my fixes." Each round reviews ALL uncommitted changes from scratch.

5. Merge and deduplicate findings from all 3 reviews into a single list.

6. Evaluate findings critically — Codex may lack full context, so assess each issue.
- If two of the three reviews are clean and the remaining review only finds
a low-severity issue (P3), treat the round as clean after documenting that.

7. Fix real issues identified by Codex.

8. Run three fresh `codex review --uncommitted` reviews again after fixing.
A skip in the task change set is NOT a pass.

9. Repeat until the task change set has no actionable in-scope issues.

10. Only a clean Codex review for the task change set allows committing.

This file is the most important part. Once it is there, Codex follows it on every session. You do not have to tell it to review — it already knows.

Step 2 of 3 — Add the wrapper script codex-review-run

What to do:

  1. Create the folder ~/bin if it does not exist: mkdir -p ~/bin
  2. Save the script below to ~/bin/codex-review-run (exactly that filename, no extension).
  3. Make it executable: chmod +x ~/bin/codex-review-run
  4. Make sure ~/bin is on your PATH. If it is not, add this line to your ~/.zshrc or ~/.bashrc: export PATH="$HOME/bin:$PATH"

This wrapper runs three codex review calls in parallel and saves the outputs under ~/.codex/review-runs/<timestamp>/. It supports three review scopes (uncommitted, base branch, commit), an optional prompt, an exit code per run, and an optional macOS notification when everything finishes.

Full script:

Show the full codex-review-run bash script (~185 lines)
#!/usr/bin/env bash

set -euo pipefail

usage() {
cat <<'EOF'
Usage:
codex-review-run --scope uncommitted [--runs 1|3] [--prompt "text"]
codex-review-run --scope base --base main [--runs 1|3] [--prompt "text"]
codex-review-run --scope commit --commit <sha> [--runs 1|3] [--prompt "text"]

This is a plain wrapper around `codex review`. It is not a Codex skill.
It saves review artifacts under ~/.codex/review-runs/<timestamp>/.

Options:
--triple Shortcut for --runs 3
--notify Send a macOS notification when all review runs finish
EOF

}

scope=""
runs="3"
base_ref=""
commit_sha=""
prompt=""
label=""
notify="0"

while [[ $# -gt 0 ]]; do
case "$1" in
--scope)
scope="${2:-}"
shift 2
;;
--runs)
runs="${2:-}"
shift 2
;;
--base)
base_ref="${2:-}"
shift 2
;;
--commit)
commit_sha="${2:-}"
shift 2
;;
--prompt)
prompt="${2:-}"
shift 2
;;
--label)
label="${2:-}"
shift 2
;;
--triple)
runs="3"
shift
;;
--notify)
notify="1"
shift
;;
--help|-h)
usage
exit 0
;;
*)
echo "Unknown argument: $1" >&2
usage >&2
exit 1
;;
esac
done

if [[ -z "$scope" ]]; then
echo "--scope is required" >&2
usage >&2
exit 1
fi

if [[ "$runs" != "1" && "$runs" != "3" ]]; then
echo "--runs must be 1 or 3" >&2
exit 1
fi

case "$scope" in
uncommitted)
;;
base)
[[ -n "$base_ref" ]] || { echo "--base is required for --scope base" >&2; exit 1; }
;;
commit)
[[ -n "$commit_sha" ]] || { echo "--commit is required for --scope commit" >&2; exit 1; }
;;
*)
echo "--scope must be one of: uncommitted, base, commit" >&2
exit 1
;;
esac

command -v codex >/dev/null 2>&1 || { echo "codex CLI not found on PATH" >&2; exit 1; }

timestamp="$(date +"%Y%m%d-%H%M%S")"
run_dir="${HOME}/.codex/review-runs/${timestamp}"
mkdir -p "$run_dir"

cat >"${run_dir}/meta.env" <<EOF
timestamp=${timestamp}
cwd=$(pwd)
scope=${scope}
runs=${runs}
base_ref=${base_ref}
commit_sha=${commit_sha}
label=${label}
notify=${notify}
EOF


run_one() {
local idx="$1"
local raw_file="${run_dir}/review-${idx}.raw.txt"
local exit_file="${run_dir}/review-${idx}.exitcode"
local -a cmd

cmd=(-c features.plugins=false review)
case "$scope" in
uncommitted) cmd+=(--uncommitted) ;;
base) cmd+=(--base "$base_ref") ;;
commit) cmd+=(--commit "$commit_sha") ;;
esac

{
echo "# review ${idx}"
echo "# cwd: $(pwd)"
echo "# scope: ${scope}"
[[ -n "$base_ref" ]] && echo "# base_ref: ${base_ref}"
[[ -n "$commit_sha" ]] && echo "# commit_sha: ${commit_sha}"
[[ -n "$label" ]] && echo "# label: ${label}"
[[ -n "$prompt" ]] && echo "# prompt: ${prompt}"
echo
} >"$raw_file"

if [[ -n "$prompt" ]]; then
codex "${cmd[@]}" "$prompt" >>"$raw_file" 2>&1
else
codex "${cmd[@]}" >>"$raw_file" 2>&1
fi

echo "$?" >"$exit_file"
}

declare -a pids=()
for idx in $(seq 1 "$runs"); do
run_one "$idx" &
pids+=("$!")
done

failed=0
for pid in "${pids[@]}"; do
if ! wait "$pid"; then
failed=1
fi
done

{
echo "run_dir=${run_dir}"
for idx in $(seq 1 "$runs"); do
echo "review_${idx}_raw=${run_dir}/review-${idx}.raw.txt"
echo "review_${idx}_exitcode=$(cat "${run_dir}/review-${idx}.exitcode" 2>/dev/null || echo "missing")"
done
} >"${run_dir}/manifest.txt"

echo "${run_dir}"

if [[ "$notify" == "1" ]] && command -v osascript >/dev/null 2>&1; then
if [[ "$failed" -eq 0 ]]; then
osascript -e 'display notification "Codex review runs finished" with title "Codex Review"' >/dev/null 2>&1 || true
else
osascript -e 'display notification "Codex review runs finished with failures" with title "Codex Review"' >/dev/null 2>&1 || true
fi
fi

if [[ "$failed" -ne 0 ]]; then
exit 1
fi

Step 3 of 3 — Add the extractor script extract-final-codex-review.py

What to do: inside your project repo, create the folder scripts/reviews/ and save the script below as extract-final-codex-review.py inside it. The full path inside your repo will be scripts/reviews/extract-final-codex-review.py. No chmod is needed because Codex calls it with python3. You only need Python 3 installed on your machine.

Why this script is needed: the raw output of codex review is long. It includes the version banner, all the git status and rg and sed commands that the reviewer ran, internal thinking, and finally the review itself at the end. If the agent reads the full file, the context gets polluted with all the tool chatter, and it can confuse the verdict.

This small Python script prints only the final review block. It looks for the last line that is exactly codex and prints everything from there to the end of the file. If there is no codex marker, it falls back to the last 120 lines.

Full script:

Show the full extract-final-codex-review.py python script (~50 lines)
#!/usr/bin/env python3

from __future__ import annotations
import argparse
from pathlib import Path

DEFAULT_FALLBACK_LINES = 120


def extract_review_output(path: Path, fallback_lines: int) -> str:
lines = path.read_text(encoding="utf-8").splitlines()
if not lines:
raise ValueError("empty review file")

codex_index = None
for index, line in enumerate(lines):
if line.strip() == "codex":
codex_index = index

if codex_index is not None:
return "\n".join(lines[codex_index:]) + "\n"

return "\n".join(lines[-fallback_lines:]) + "\n"


def main() -> int:
parser = argparse.ArgumentParser(
description="Print the final Codex review output from one or more raw review files."
)
parser.add_argument("paths", nargs="+", help="One or more review-*.raw.txt files")
parser.add_argument("--fallback-lines", type=int, default=DEFAULT_FALLBACK_LINES)
args = parser.parse_args()

exit_code = 0
for index, raw_path in enumerate(args.paths):
path = Path(raw_path)
if len(args.paths) > 1:
if index > 0:
print()
print(f"=== {path}")
try:
print(extract_review_output(path, args.fallback_lines), end="")
except Exception as error:
exit_code = 1
print(f"[extract-final-codex-review] {error}")

return exit_code


if __name__ == "__main__":
raise SystemExit(main())

That is it for the Codex side. Three files in total: the AGENTS.md section, the wrapper, and the extractor. After that, every Codex session in any repo will follow the loop on its own.

Part 2 — The Claude Code loop

For Claude Code I use the same Codex reviewer, but I call it through the official Codex plugin for Claude Code. There are two pieces to install on this side: the plugin itself, and a Skill that tells Claude how to run three reviews in parallel and how to read the results.

Step 1 of 2 — Install the Codex plugin for Claude Code

What to do: open ~/.claude/settings.json and merge the two keys below into the file. If the file already has enabledPlugins or extraKnownMarketplaces keys, add these entries inside them instead of overwriting.

{
"enabledPlugins": {
"codex@openai-codex": true
},
"extraKnownMarketplaces": {
"openai-codex": {
"source": {
"source": "github",
"repo": "openai/codex-plugin-cc"
}
}
}
}

After you save the file, restart Claude Code. It will fetch the plugin from the openai/codex-plugin-cc GitHub repo and install it. Once installed, you get:

One important point: the slash commands have disable-model-invocation: true. That means Claude itself cannot run them — only you, the human, can type them. To let Claude run a review automatically, we have to call the companion script directly. That is exactly what the Skill below does.

Step 2 of 2 — Add the Skill cc-3parallel-codex-reviews

What to do:

  1. Create the folder ~/.claude/skills/cc-3parallel-codex-reviews/ (the folder name matters — it has to match the name: field at the top of the Skill).
  2. Inside that folder, save the file below as SKILL.md (exactly that filename, all caps).
  3. The full path will be ~/.claude/skills/cc-3parallel-codex-reviews/SKILL.md.

Claude Code reads all skills under ~/.claude/skills/ automatically — no extra configuration. Once the file is saved, the next Claude Code session will see it. When you ask Claude to "run 3 parallel codex reviews" or "check my uncommitted changes with Codex", it will pick up this Skill and follow it.

Full Skill file:

Show the full SKILL.md file (~100 lines)
---
name: cc-3parallel-codex-reviews
description: Run 3 parallel Codex code reviews on the uncommitted working tree via the Codex companion runtime, returning clean Markdown findings per reviewer. Use for pre-commit review gates, or when the user asks to "codex-review", "run 3 parallel reviews", or "check my uncommitted changes with Codex". Each reviewer's output is ~50 clean lines of findings (no raw-transcript extraction needed).
allowed-tools: Bash, Read

---


# 3 Parallel Codex Reviews

Launches three independent Codex reviews against the uncommitted working tree by invoking the same companion runtime that backs the `/codex:review` slash command. Each review runs as its own background Bash task, so each produces its own completion notification and its own clean-Markdown output_file.

## Preconditions

Before step 1, verify:

- `node` is on `PATH`.
- Codex companion script exists at `~/.claude/plugins/marketplaces/openai-codex/plugins/codex/scripts/codex-companion.mjs` (it's installed as part of the `openai-codex` plugin). If missing, stop and tell the user to install the Codex plugin.
- Working tree has uncommitted changes → `git status --short` must be non-empty. If empty, there is nothing to review — tell the user and stop.

## Procedure

### Step 1 — Launch 3 reviews in parallel

Three **separate** Bash tool calls, each with `run_in_background: true`, each running:

```bash
node "$HOME/.claude/plugins/marketplaces/openai-codex/plugins/codex/scripts/codex-companion.mjs" review --wait --scope working-tree
```


Why the companion directly (not the slash command): `/codex:review` has `disable-model-invocation: true` and cannot be invoked by agents via the Skill tool. The companion script is the exact thing the slash command wraps.

Why three separate Bash calls (not a loop inside one Bash): each call gives its own completion notification with its own `output_file`. A single-Bash fan-out (like the `--triple` wrapper) yields one combined notification instead.

### Step 2 — Wait for 3 completion notifications

Do not poll. Do not call `BashOutput`. Do not read files early. Just wait. The harness will send three `task-notification` messages — one per background Bash call — as they complete (typically ~60–120 s each).

### Step 3 — Read each output_file

For each of the three notifications, use the `Read` tool on the notification's `output_file` path. Each output_file contains the companion's clean Markdown output, which looks like:

```
[codex] Starting Codex review thread.
[codex] ...
[codex] Turn completed.
# Codex Review

Target: working tree diff

<summary paragraph>

Full review comments:

- [P2] <title> — <absolute file path>:<line-range>
<explanation>

Reasoning:
- ...

```


The `[codex]` progress log is a compact one-line-per-step trace (useful for verifying the review actually ran). The `# Codex Review` section is the structured output. The `Reasoning:` trailer is optional context from the reviewer's thoughts.

### Step 4 — Merge, dedupe, and surface

Surface to the user:

- **Per-review verdict**: zero-issues vs issues-found.
- **Deduplicated unique findings** across all 3 reviews, each with file:line citation and severity (`P0`/`P1`/`P2`/`P3`). Findings flagged by multiple reviewers are higher-confidence.
- **Your triage**: real vs false-positive, with reason. A finding is almost certainly a false positive if its cited file/line is **not** in `git status --short` — that means the reviewer explored an out-of-scope artifact (stale review output, unrelated committed code, etc.).
- **Recommended fixes** per real finding, including the test you'd add.

## What NOT to do

- **Do not** invoke `/codex:review`, `/codex:adversarial-review`, `/codex:status`, or `/codex:result` from the Skill tool — all are `disable-model-invocation: true` and the Skill tool will reject them. Those slash commands are user-typed only.
- **Do not** use `~/bin/codex-review-run --triple` — it fans out inside a single Bash call, so the agent gets one combined notification instead of three per-reviewer notifications.
- **Do not** spawn `Agent(subagent_type: ...)` subagents for the reviewer. Subagent sandboxes typically block both `/codex:review` (via `disable-model-invocation`) and Bash access to the `codex` binary, so the review never actually runs.
- **Do not** pass a custom prompt to the companion. The canonical workflow is no-prompt. Context the reviewer needs (project rules, library docs, design system) should live in persistent files the reviewer finds through its own diff exploration — not in per-run prompts.
- **Do not** act on findings during the review. This skill is read-only — surface findings to the user; let them decide. The triage step is read-only too (classifying findings, not fixing code).

## Why three reviews, not one

A single `codex review` pass has reasoning variance: different runs on the same diff surface different subsets of issues. Three independent runs caught as a union approach complete coverage. Empirically ~90% of findings are real; if only one of three reviewers flags an issue, still treat it as real and triage on its merits rather than by vote count.

## Filesystem contamination

Codex reviewers run in `danger-full-access` and do exploratory `grep`/`find` across accessible paths. Old review artifacts under `~/.codex/review-runs/` (left behind by the human `codex-review-run --triple` wrapper) can contaminate findings — the reviewer tails them and may cite content from a *prior* session as a current issue.

Mitigations:

- If contamination starts polluting findings, move old artifacts *out* of `~/.codex/`: `mv ~/.codex/review-runs ~/codex-review-archive-$(date +%s)`. Renaming to `~/.codex/review-runs.archive/` does **not** work — the reviewer still follows the renamed path.
- Always verify that finding-cited file paths are in `git status --short`. If a finding cites a file that is not in the uncommitted diff, it is almost certainly out-of-scope contamination.

## Scope variants

Swap the `--scope` flag for different review scopes. Everything else in the procedure stays identical:

| Scope flag | Meaning | When to use |
|---|---|---|
| `--scope working-tree` | Uncommitted staged + unstaged + untracked | Per-task pre-commit loop (skill default) |
| `--scope branch` | Branch vs default base ref (`main`/`develop`) | PR-level review before merge |
| `--scope branch --base <ref>` | Branch vs a specific base | Reviewing against a non-default base |

If the user asks for a non-default scope, swap the flag in step 1 and keep everything else identical.

The key trick on this side is the three separate background Bash calls. If Claude Code makes one Bash call that runs all three reviews together, it gets one combined notification at the end. With three separate calls, it gets three separate notifications and three separate output files, which is much easier to read and merge.

The other key difference from the Codex side is that the companion script already emits clean Markdown. There is no need for the Python extractor here — the output is structured and short.

How the loop actually runs

The flow is the same on both sides. Once everything is installed:

  1. I give the agent a task. The agent writes code, writes tests, runs tests, and confirms they pass.
  2. Before committing, the agent runs three codex review in parallel. On the Codex side this is the wrapper script. On the Claude Code side this is the Skill.
  3. The agent reads the three review outputs. It merges findings, removes duplicates, and decides which findings are real.
  4. The agent fixes the real findings.
  5. The agent runs three fresh reviews again. Always on the full uncommitted diff. Never just on the fixes.
  6. The agent keeps repeating until the reviews come back clean.
  7. Only then the agent is allowed to commit.

There is no fixed number of rounds. Sometimes the agent gets a clean review on the first try. Sometimes it runs the loop 10 or 20 times before it converges. The agent keeps going on its own. I just wait.

A few rules I had to add

I learned these the hard way:

Always review the full diff, not just the fix. Codex reviewers find slightly different issues on each run. If the agent reviews only the latest change, it misses problems in code it already wrote earlier in the task. Every round must review the entire uncommitted diff from scratch.

No self-approval. If the agent decides a finding is "fine" without fixing it, the next round will likely flag the same issue again, or something close to it. The rule is simple: fix the issue, or write a clear note explaining why it is out of scope.

A "two clean, one P3" exit clause. Otherwise the loop can go forever on small, low-priority issues like a one-line nit. If two of the three reviews come back clean and the third only finds a P3 issue or something that belongs to a later task, the round is clean.

Move old review outputs out of ~/.codex/. The Codex reviewer has wide filesystem access and explores around the repo. If there are old review files under ~/.codex/review-runs/, the reviewer can read them and report findings from a previous session as if they were current. When this happens, I move the old folder out completely: mv ~/.codex/review-runs ~/codex-review-archive-$(date +%s). Renaming inside ~/.codex/ does not help — the reviewer still finds them.

Do not pass a custom review prompt. I tried this in the beginning. It always made the review worse. The reviewer is more useful when it picks its own focus from the diff. Anything I want the reviewer to know should live in AGENTS.md or CLAUDE.md, not in the per-run command.

What changed for me

Before this setup, I was still in the loop. The agent would finish a task, and then I had to manually call a Codex review, read the output, decide what to fix, and tell the agent to fix it. Then I had to call another review. Then another one. I was the one driving the review cycle.

Now both Codex and Claude Code call the Codex reviewers by themselves, in a loop, as many times as needed, until the reviews come back clean. The agent decides when to run a review, reads the findings, fixes them, and runs the next round. I am not in that cycle anymore.

This is the part I want to highlight: one more step in the build cycle is now closed without a human in the middle. The reviewer is still doing the hard work, but the agent is the one calling it.

If you are using AI agents for real work, this loop is the highest-leverage thing I have built this year. Three files on the Codex side, one Skill and one plugin on the Claude Code side, and the rest takes care of itself.