Skip to content

refactor(ce-code-review): make ce-code-review review-only and split apply to callers#881

Open
tmchow wants to merge 13 commits into
mainfrom
refactor/ce-code-review-review-only
Open

refactor(ce-code-review): make ce-code-review review-only and split apply to callers#881
tmchow wants to merge 13 commits into
mainfrom
refactor/ce-code-review-review-only

Conversation

@tmchow
Copy link
Copy Markdown
Collaborator

@tmchow tmchow commented May 30, 2026

Why this PR

ce-code-review had grown into two jobs at once: find problems and fix them (autofix, walkthrough, checkout, ticket filing). That made orchestrated workflows brittle — callers could not predict whether the review skill would mutate the tree, and Tier 2 shipping often ran review and fix in one opaque step.

This branch does two related things:

  1. Slim the reviewer roster — drop redundant stack/style personas and merge overlapping migration agents so each review dispatches fewer, sharper subagents.
  2. Split review from applyce-code-review is review-only; ce-work, lfg, and ce-work-beta own fix policy via explicit followup references.

Net effect: smaller review teams, predictable JSON for automation, and shipping workflows that review → apply → residual gate instead of hoping autofix did the right thing.

Reviewer roster changes (agents removed)

These agents are deleted from the plugin and registered in stale cleanup so flat installs upgrade cleanly:

Removed agent Why
ce-dhh-rails-reviewer Stack/style philosophy folded into always-on ce-maintainability-reviewer
ce-kieran-rails-reviewer Same — structural/quality signal, not a separate spawn
ce-kieran-python-reviewer Same
ce-kieran-typescript-reviewer Same
ce-schema-drift-detector Merged into conditional ce-data-migration-reviewer (Step 0 schema drift)
ce-data-migration-expert Consolidated
ce-data-migrations-reviewer Consolidated

Replacement: one conditional ce-data-migration-reviewer for migration artifacts + schema dumps (db/schema.rb, structure.sql, etc.). Schema drift surfaces as P1 findings in the main report — not a separate agent or report section.

Stack-specific reviewers retained (2): ce-julik-frontend-races-reviewer, ce-swift-ios-reviewer — runtime behavior the always-on personas do not cover. No more per-language Kieran/DHH spawns on file extension alone.

Current team shape: 4 always-on personas + 2 CE always-on agents, up to 7 cross-cutting conditionals, 2 stack-specific conditionals, and ce-deployment-verification-agent for risky migrations. Small diffs → ~6 reviewers; large auth/DB features → ~9–10.

Review-only contract (ce-code-review)

Before After
mode:autofix could edit the checkout Deprecated — clear error; callers apply fixes
safe_auto tier + in-skill autofix Removed — synthesis remaps legacy safe_autogated_auto; callers decide what to apply
Interactive walkthrough / bulk preview / in-skill tracker-defer Removed from skill (defer lives under ce-work / lfg)
Headless text envelope in agent mode mode:agent JSON — one parseable object + review.json artifact
Applied Fixes section in report Gone — Actionable Findings handoff only

Also: PR skip paths return {"status":"skipped",...} in agent mode; pr-remote scope fetches PR head ref and forbids workspace Read/Grep on changed files when reviewing a PR from another branch.

Orchestrator changes (ce-work, ce-work-beta, lfg)

Shipping Phase 3 is restructured:

  • ce-simplify-code — separate conditional step (≥30 changed lines), not bundled into review tiers
  • Tier 1 — harness-native review when available (/review, etc.)
  • Tier 2 — escalation only (sensitive surface, large/diffuse diff, explicit plan ask) — not because Tier 1 is missing
  • Tier 2 flow: (2a) ce-code-review mode:agent → (2b) review-findings-followup.md batches fixes by file → Residual Work Gate
  • Skip dedicated review when no Tier 1 and Tier 2 criteria not met (document in summary)

lfg mirrors the same review-then-apply split with a self-contained lfg/references/review-followup.md. ce-work-beta gets its own copy of review-findings-followup.md (skills install as isolated directories — no cross-skill references).

Files removed from ce-code-review

  • references/walkthrough.md, references/bulk-preview.md, references/tracker-defer.md
  • Added: references/action-class-rubric.md, caller-owned review-findings-followup.md under ce-work / ce-work-beta

Test plan

  • bun test tests/review-skill-contract.test.ts tests/pipeline-review-contract.test.ts (66 pass)
  • Legacy cleanup tests cover removed ce-dhh-*, ce-kieran-*, ce-schema-drift-detector agent sweeps
  • CI test check on PR

Compound Engineering

tmchow and others added 6 commits May 21, 2026 18:26
…wers

Remove DHH/Kieran stack reviewers, fold structural checks into maintainability, suppress P3 findings in synthesis, and merge migration/schema-drift agents into a single conditional data-migration reviewer.

Co-authored-by: Cursor <cursoragent@cursor.com>
Extend data-migration Step 0 to diff db/structure.sql when present, and route plan deepening migration risks to ce-data-integrity-guardian instead of the PR-review persona.

Co-authored-by: Cursor <cursoragent@cursor.com>
Re-enable P3 findings in reports and synthesis after the persona refactor
temporarily suppressed them. Register removed migration and stack reviewers
in LEGACY_ONLY_AGENT_DESCRIPTIONS and EXTRA_LEGACY ce-* agent paths so
upgrades sweep stale flat installs.

Co-authored-by: Cursor <cursoragent@cursor.com>
Register removed review personas in STALE_AGENT_NAMES and
LEGACY_ONLY_AGENT_DESCRIPTIONS, and resolve current-agent lookup for
names that already include the ce- prefix so cleanupStaleAgents removes
flat installs like ce-dhh-rails-reviewer.md.

Co-authored-by: Cursor <cursoragent@cursor.com>
…callers

Orchestrated workflows now invoke review for findings and own fix application separately, so ce-work and lfg can batch fixes without the review skill mutating the checkout.

Co-authored-by: Cursor <cursoragent@cursor.com>
Resolve ce-code-review Stage 6 conflict by keeping review-only output
(no Applied Fixes section, JSON schema for mode:agent).

Co-authored-by: Cursor <cursoragent@cursor.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0346beae3b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread plugins/compound-engineering/skills/ce-work-beta/references/shipping-workflow.md Outdated
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5391c57851

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread plugins/compound-engineering/skills/lfg/references/review-followup.md Outdated
…te (#881)

- Duplicate review-findings-followup into ce-work-beta for isolated installs
- Inline LFG apply bar in lfg/references/review-followup.md
- Replace stale headless text envelope with JSON contract in review-output-template

Co-authored-by: Cursor <cursoragent@cursor.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9bee6ec5cc

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread plugins/compound-engineering/skills/ce-code-review/SKILL.md Outdated
Comment thread plugins/compound-engineering/skills/ce-code-review/SKILL.md Outdated
Return status skipped in mode:agent when PR pre-checks bail out.
In pr-remote scope, fetch PR head ref and forbid workspace Read/Grep
for changed files; validators use git show or diff hunks only.

Co-authored-by: Cursor <cursoragent@cursor.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a2ec295fb5

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread plugins/compound-engineering/skills/ce-code-review/SKILL.md
Comment thread plugins/compound-engineering/skills/ce-code-review/SKILL.md Outdated
Add branch-remote inspection rules for no-checkout branch reviews, use local tree diffs instead of appending gh pr diff when aligned on the PR branch, and sync ce-work-beta residual gate sentinels with ce-work.

Co-authored-by: Cursor <cursoragent@cursor.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3f22822d9b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread plugins/compound-engineering/skills/ce-code-review/SKILL.md Outdated
Comment thread plugins/compound-engineering/skills/ce-code-review/SKILL.md Outdated
…#881)

Two remaining gaps in the review-scope model surfaced by Codex review of
3f22822, both causing findings to be validated against the wrong tree:

- PR scope classification trusted the branch name alone, so a fork PR (or
  a stale local branch) sharing a name with the PR head was treated as
  local-aligned and diffed/inspected against the local checkout. Now
  local-aligned also requires the PR to be same-repo (isCrossRepository
  false) and the PR head commit to be an ancestor of local HEAD; metadata
  fetch gains headRefOid + isCrossRepository.
- Stage 5b validators never received the scope mode or remote head ref, so
  in pr-remote/branch-remote they fell back to Read/Grep on the stale
  workspace. Now the scope mode and <pr-head-ref>/<branch-head-ref> are
  injected into validator prompts (mirroring Stage 4), with inspection
  access scoped by mode. The validator template already consumed these.

Note: pre-existing CLI install test failures (tests/cli.test.ts) are
unrelated to this change and present on HEAD before it.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: bc9d30260a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread plugins/compound-engineering/skills/ce-code-review/SKILL.md
@tmchow tmchow changed the title refactor(review): make ce-code-review review-only and split apply to callers refactor(ce-code0review): make ce-code-review review-only and split apply to callers Jun 2, 2026
@tmchow tmchow changed the title refactor(ce-code0review): make ce-code-review review-only and split apply to callers refactor(ce-code-review): make ce-code-review review-only and split apply to callers Jun 2, 2026
…ew in apply followup

Address PR review feedback (#881):

- ce-code-review: Stage 2 branch-mode intent discovery now runs
  `git log ${BASE}..<branch-ref>` using the resolved ref instead of the raw
  `<branch>` argument, so remote-only branch reviews read the right commit
  intent instead of failing or reading a stale same-named local branch.
- ce-work / ce-work-beta apply followup: consume the review already produced
  by Tier 2 step 2a instead of re-invoking ce-code-review; re-invocation is
  now a documented cold-caller fallback. Avoids a second full review per
  Tier 2 run. Updated the ce-work SKILL.md section anchor and the contract
  test to lock in the corrected (no-double-review) behavior.

Note: pre-existing failures in CLI install/cleanup/target-detection tests
(47, unrelated to these skill-content changes) not addressed by this PR.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 633576a3f9

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Present post-completion options via the platform question tool:

1. **Run `/ce-code-review`** on the cumulative diff (baseline to final). Load the `ce-code-review` skill with `mode:autofix` on the optimization branch.
1. **Run `/ce-code-review`** on the cumulative diff (baseline to final). Load the `ce-code-review` skill on the optimization branch (interactive or `mode:agent`). Apply eligible mechanical fixes using `ce-work` `references/review-findings-followup.md` if you want fixes landed before the next option.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Keep ce-optimize follow-up self-contained

When this option is selected from an installed or converted ce-optimize skill, the instruction sends the agent to ce-work's references/review-findings-followup.md, but skill directories are treated as isolated units and the repo guidance in AGENTS.md says skill content must not reference files outside its own directory tree. In an optimize-only or converted install, the apply policy can therefore point at a missing sibling-skill file; inline the mechanical apply bar here or add a local ce-optimize/references/... copy.

Useful? React with 👍 / 👎.

…gs-table output

ce-code-review hardening prompted by a dogfood run that shipped findings in
the output template's own documented anti-pattern format and skipped
validation on the P1 findings:

- Stage 5b: resolve the "skip when >15 survivors" vs "validate top-15"
  contradiction. The stage no longer skips while any finding survives, and
  P0/P1 findings are always validated (raise the cap rather than drop a
  critical); only the P2/P3 tail is dropped when over budget.
- Stage 5b: sanction orchestrator direct verification of load-bearing facts
  (a pinned dependency's source, repo wiring) as a complement to the validator
  wave, using single-purpose native tools instead of chained shell.
- Stage 6: inline a literal findings-table skeleton in the always-loaded
  SKILL.md plus a concrete negative-signature tripwire (no Field: blocks, no
  box-drawing/middot separators, no lists) so the output contract survives a
  long session even when the template reference was not reloaded.
- Output format: state that mode:agent JSON is the deterministic cross-harness
  machine contract and default markdown is the human view; keep markdown
  ASCII-safe so it degrades gracefully across terminals and harnesses.
- Lock the P0/P1-always-validated invariant into the Stage 5b contract test.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0b4baf792c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

1. Filter `actionable_findings` (or markdown Actionable Findings) with the bar above.
2. Apply eligible fixes in the working tree in severity order (`#` stable from the review).
3. Run targeted tests when `requires_verification: true` on any applied finding.
4. If `git status --short` shows changes, stage only review-driven files, commit `fix(review): apply review findings`, and push before step 5. If no eligible fixes were applied, note explicitly and skip commit.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Handle fresh branches before pushing review fixes

When LFG is running on a newly-created feature branch, step 7 has not yet invoked ce-commit-push-pr, so the branch commonly has no upstream. This new follow-up still requires a push immediately after committing review fixes, but it no longer includes the old git push --set-upstream <remote> HEAD fallback; in that context git push fails and blocks the autonomous pipeline before residual handoff/browser tests. Add the same upstream-resolution path here or defer the push to the later PR step.

Useful? React with 👍 / 👎.

### Emit actionable findings summary

After the user's choice executes (tickets filed, walk-through completed, or ignore recorded), emit the unified completion report. The report reflects the final state including any tickets filed or additional fixes applied during walk-through re-entry.
After Stage 6, emit a compact **Actionable Findings** summary for callers:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Keep agent-mode output to a single JSON object

This post-Stage-6 instruction is unqualified, so in mode:agent it can cause the skill to append a prose Actionable Findings summary after emitting the JSON object. Programmatic callers such as LFG parse the primary response as JSON, and any extra markdown/text after the object violates the new agent-mode contract; make this summary default-mode-only or require it to be represented solely by the actionable_findings field in agent mode.

Useful? React with 👍 / 👎.

The human-facing per-severity tables are now 5 columns
(# | File | Issue | Reviewer | Confidence). The synthesized route
(autofix_class -> owner) moves to the Actionable Findings table and the
mode:agent JSON, where it is actionable — keeping the scannable severity
tables narrow enough to render across terminals and non-Claude harnesses.

The finding set, routing, confidence gating, and JSON contract are
unchanged; this is presentation only. Updates the Stage 6 inline skeleton
and column description, the output-template examples and formatting rules,
the numbering fixture, and the primary-findings regex in the contract test.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 729b7fef99

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

### JSON output format (`mode:agent` only)

Gated-auto findings (concrete fix, changes behavior/contracts):
Emit **one JSON object** as the primary response (fenced ```json block or raw JSON — caller must be able to parse it). Also write `review.json` under `/tmp/compound-engineering/ce-code-review/<run-id>/` with the same payload.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Emit raw JSON for agent-mode callers

When mode:agent is invoked by LFG or another automated caller that parses the primary response with a normal JSON parser, permitting a fenced ```json block means the response can start with backticks instead of {, so `JSON.parse`/equivalent consumers fail even though the review completed. The contract above describes this as one machine-readable JSON object, so agent mode should require raw JSON only and leave markdown fencing to the human default mode.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant