fix(ce-resolve-pr-feedback): drop clustering, default to merit-based fixing by tmchow · Pull Request #893 · EveryInc/compound-engineering-plugin

tmchow · 2026-06-01T21:39:48Z

Summary

ce-resolve-pr-feedback was carrying two things that no longer earn their keep: a gated cross-invocation cluster analysis that fired rarely yet cost loaded tokens on every run, and a "fix everything valid, when in doubt fix it" policy calibrated for human reviewers — which churns the code when applied to today's bot-dominated feedback.

This reshapes the skill around two principles: judge each item on its merits (regardless of who raised it or in what form) and default to fixing, diverting only when reading the code trips a concrete signal. It's faster (no per-run clustering pass, leaner loaded context) and steadier — it stops over-fixing bot noise without under-fixing real nitpicks.

What changed

Clustering is gone, end to end. The cross-invocation analysis required multi-round review and spatial overlap to fire — a narrow path that taxed every run regardless. Its one real benefit, broader investigation of recurring patterns, folds into the resolver's own judgment when it reads the file. full-mode drops from 10 steps to 9; the resolver agent loses Cluster Mode; the fetch script drops the cross_invocation envelope.

The resolver is now a default-fix tripwire, not a per-item gate. Most feedback is correct and simply gets fixed. The agent has to read the code to make the fix anyway, so the validation checks are tripwires it notices during that read — not a separate analysis pass. It diverts only on a concrete signal:

Signal	Verdict
Finding doesn't hold (the code disproves it)	`not-addressing`
Fix would make the code worse	`declined`
Change buys nothing real (cosmetic / immaterial)	`replied`
Risk can't be bounded after de-risking	`needs-human`
It's a question, not a change	`replied` / `needs-human`

An explicit guard keeps this from sliding into over-thinking: "'I'm uneasy' is not a tripwire; 'I read the callers and this breaks X' is." Correct nitpicks still get fixed — the skip bar is "no benefit," not "minor."

Source and form no longer change the verdict. Human vs. bot, and inline thread vs. formal review body vs. top-level comment, are judged identically. Form changes only the reply/resolve mechanics (GraphQL resolve vs. a top-level reply), never whether a finding is correct — and there's deliberately no bot-classification heuristic, which would risk dismissing a real bot-caught bug.

Parallel dispatch, file-collision avoidance, combined validation, the two-pass verify loop, and outdated-line relocation are unchanged.

Test plan

bun run release:validate in sync; the pagination, frontmatter, skill-shell-safety, and review-skill-contract suites pass. Pre-existing CLI-suite failures are environmental (the sandbox blocks remote clones and global-config writes) and reproduce identically on a clean tree.

…fixing Simplify the skill around two ideas: judge feedback on its merits -- not its source or form -- and default to fixing unless a concrete signal trips. - Remove cross-invocation cluster analysis end to end. The gated analysis fired rarely (it required multi-round review plus spatial overlap), cost loaded tokens on every run, and its only real value folds into the resolver's own judgment. Drops the cross_invocation envelope from get-pr-comments, the cluster step from full-mode (now 9 steps, was 10), and Cluster Mode from the ce-pr-comment-resolver agent. - Reframe the resolver from a per-item validation gate to "default to fixing; divert only on a tripwire" -- finding doesn't hold, fix would harm, change buys nothing, risk can't be bounded, or it's a question. The checks are the read done to make the fix, not a separate analysis pass, with an explicit guard against manufacturing doubt to avoid work. - Judge every item the same regardless of source (human or bot) or form (inline thread, review body, top-level comment); form changes only the reply/resolve mechanics, never whether a finding is correct. - Keep the replied verdict but stop leading the rubric with the question/answer split, and drop the step-4 category buckets.

tmchow merged commit 3e77a7b into main Jun 2, 2026
2 checks passed

github-actions Bot mentioned this pull request Jun 2, 2026

chore: release main #894

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(ce-resolve-pr-feedback): drop clustering, default to merit-based fixing#893

fix(ce-resolve-pr-feedback): drop clustering, default to merit-based fixing#893
tmchow merged 1 commit into
mainfrom
tmchow/resolve-pr-feedback-streamline

tmchow commented Jun 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

tmchow commented Jun 1, 2026

Summary

What changed

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant