fix(ce-resolve-pr-feedback): drop clustering, default to merit-based fixing#893
Merged
Merged
Conversation
…fixing Simplify the skill around two ideas: judge feedback on its merits -- not its source or form -- and default to fixing unless a concrete signal trips. - Remove cross-invocation cluster analysis end to end. The gated analysis fired rarely (it required multi-round review plus spatial overlap), cost loaded tokens on every run, and its only real value folds into the resolver's own judgment. Drops the cross_invocation envelope from get-pr-comments, the cluster step from full-mode (now 9 steps, was 10), and Cluster Mode from the ce-pr-comment-resolver agent. - Reframe the resolver from a per-item validation gate to "default to fixing; divert only on a tripwire" -- finding doesn't hold, fix would harm, change buys nothing, risk can't be bounded, or it's a question. The checks are the read done to make the fix, not a separate analysis pass, with an explicit guard against manufacturing doubt to avoid work. - Judge every item the same regardless of source (human or bot) or form (inline thread, review body, top-level comment); form changes only the reply/resolve mechanics, never whether a finding is correct. - Keep the replied verdict but stop leading the rubric with the question/answer split, and drop the step-4 category buckets.
Merged
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
ce-resolve-pr-feedbackwas carrying two things that no longer earn their keep: a gated cross-invocation cluster analysis that fired rarely yet cost loaded tokens on every run, and a "fix everything valid, when in doubt fix it" policy calibrated for human reviewers — which churns the code when applied to today's bot-dominated feedback.This reshapes the skill around two principles: judge each item on its merits (regardless of who raised it or in what form) and default to fixing, diverting only when reading the code trips a concrete signal. It's faster (no per-run clustering pass, leaner loaded context) and steadier — it stops over-fixing bot noise without under-fixing real nitpicks.
What changed
Clustering is gone, end to end. The cross-invocation analysis required multi-round review and spatial overlap to fire — a narrow path that taxed every run regardless. Its one real benefit, broader investigation of recurring patterns, folds into the resolver's own judgment when it reads the file.
full-modedrops from 10 steps to 9; the resolver agent loses Cluster Mode; the fetch script drops thecross_invocationenvelope.The resolver is now a default-fix tripwire, not a per-item gate. Most feedback is correct and simply gets fixed. The agent has to read the code to make the fix anyway, so the validation checks are tripwires it notices during that read — not a separate analysis pass. It diverts only on a concrete signal:
not-addressingdeclinedrepliedneeds-humanreplied/needs-humanAn explicit guard keeps this from sliding into over-thinking: "'I'm uneasy' is not a tripwire; 'I read the callers and this breaks X' is." Correct nitpicks still get fixed — the skip bar is "no benefit," not "minor."
Source and form no longer change the verdict. Human vs. bot, and inline thread vs. formal review body vs. top-level comment, are judged identically. Form changes only the reply/resolve mechanics (GraphQL resolve vs. a top-level reply), never whether a finding is correct — and there's deliberately no bot-classification heuristic, which would risk dismissing a real bot-caught bug.
Parallel dispatch, file-collision avoidance, combined validation, the two-pass verify loop, and outdated-line relocation are unchanged.
Test plan
bun run release:validatein sync; the pagination, frontmatter, skill-shell-safety, and review-skill-contract suites pass. Pre-existing CLI-suite failures are environmental (the sandbox blocks remote clones and global-config writes) and reproduce identically on a clean tree.