Skip to content

fix(ce-resolve-pr-feedback): drop clustering, default to merit-based fixing#893

Merged
tmchow merged 1 commit into
mainfrom
tmchow/resolve-pr-feedback-streamline
Jun 2, 2026
Merged

fix(ce-resolve-pr-feedback): drop clustering, default to merit-based fixing#893
tmchow merged 1 commit into
mainfrom
tmchow/resolve-pr-feedback-streamline

Conversation

@tmchow
Copy link
Copy Markdown
Collaborator

@tmchow tmchow commented Jun 1, 2026

Summary

ce-resolve-pr-feedback was carrying two things that no longer earn their keep: a gated cross-invocation cluster analysis that fired rarely yet cost loaded tokens on every run, and a "fix everything valid, when in doubt fix it" policy calibrated for human reviewers — which churns the code when applied to today's bot-dominated feedback.

This reshapes the skill around two principles: judge each item on its merits (regardless of who raised it or in what form) and default to fixing, diverting only when reading the code trips a concrete signal. It's faster (no per-run clustering pass, leaner loaded context) and steadier — it stops over-fixing bot noise without under-fixing real nitpicks.

What changed

Clustering is gone, end to end. The cross-invocation analysis required multi-round review and spatial overlap to fire — a narrow path that taxed every run regardless. Its one real benefit, broader investigation of recurring patterns, folds into the resolver's own judgment when it reads the file. full-mode drops from 10 steps to 9; the resolver agent loses Cluster Mode; the fetch script drops the cross_invocation envelope.

The resolver is now a default-fix tripwire, not a per-item gate. Most feedback is correct and simply gets fixed. The agent has to read the code to make the fix anyway, so the validation checks are tripwires it notices during that read — not a separate analysis pass. It diverts only on a concrete signal:

Signal Verdict
Finding doesn't hold (the code disproves it) not-addressing
Fix would make the code worse declined
Change buys nothing real (cosmetic / immaterial) replied
Risk can't be bounded after de-risking needs-human
It's a question, not a change replied / needs-human

An explicit guard keeps this from sliding into over-thinking: "'I'm uneasy' is not a tripwire; 'I read the callers and this breaks X' is." Correct nitpicks still get fixed — the skip bar is "no benefit," not "minor."

Source and form no longer change the verdict. Human vs. bot, and inline thread vs. formal review body vs. top-level comment, are judged identically. Form changes only the reply/resolve mechanics (GraphQL resolve vs. a top-level reply), never whether a finding is correct — and there's deliberately no bot-classification heuristic, which would risk dismissing a real bot-caught bug.

Parallel dispatch, file-collision avoidance, combined validation, the two-pass verify loop, and outdated-line relocation are unchanged.

Test plan

bun run release:validate in sync; the pagination, frontmatter, skill-shell-safety, and review-skill-contract suites pass. Pre-existing CLI-suite failures are environmental (the sandbox blocks remote clones and global-config writes) and reproduce identically on a clean tree.


Compound Engineering
Claude Code

…fixing

Simplify the skill around two ideas: judge feedback on its merits -- not its
source or form -- and default to fixing unless a concrete signal trips.

- Remove cross-invocation cluster analysis end to end. The gated analysis
  fired rarely (it required multi-round review plus spatial overlap), cost
  loaded tokens on every run, and its only real value folds into the
  resolver's own judgment. Drops the cross_invocation envelope from
  get-pr-comments, the cluster step from full-mode (now 9 steps, was 10),
  and Cluster Mode from the ce-pr-comment-resolver agent.
- Reframe the resolver from a per-item validation gate to "default to
  fixing; divert only on a tripwire" -- finding doesn't hold, fix would
  harm, change buys nothing, risk can't be bounded, or it's a question.
  The checks are the read done to make the fix, not a separate analysis
  pass, with an explicit guard against manufacturing doubt to avoid work.
- Judge every item the same regardless of source (human or bot) or form
  (inline thread, review body, top-level comment); form changes only the
  reply/resolve mechanics, never whether a finding is correct.
- Keep the replied verdict but stop leading the rubric with the
  question/answer split, and drop the step-4 category buckets.
@tmchow tmchow merged commit 3e77a7b into main Jun 2, 2026
2 checks passed
@github-actions github-actions Bot mentioned this pull request Jun 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant