docs(review-pr): Pass C v1.1 — fenced-body heuristic + doc refinements (4.5.0)#44
Merged
Conversation
Stream C of reviewer-rigor PR. Bundles three housekeeping items: - skills/release/SKILL.md: quote frontmatter description to fix PyYAML parse error (claude plugin validate now clean) - coding-best-practices added to skills frontmatter on 6 agents missing it (architect, claudius, qa, security, writer, ux) - orchestrator skills (grumpy-review, check-pr-comments) note BP preload requirement for spawned agents - tests/test_skill_frontmatter.py: regression guard parses every frontmatter via PyYAML - tests/test_bp_load_audit.py: asserts curated agent set loads BP - .claude-plugin/plugin.json: 4.1.7 -> 4.2.0 - CHANGELOG.md: [4.2.0] entry covering the full PR (Streams A + B + C) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…all-tree inspection Pipeline plumbing for the upcoming reviewer call-tree walk (Stream A of reviewer-rigor PR). - schemas/review-report.schema.json: CALL added to id pattern; call_tree added to category enum and matrix - scripts/consolidate_reports.py: CATEGORY_PREFIX maps call_tree -> CALL- - scripts/generate_review_report.py: CATEGORY_LABELS includes call_tree - skills/report-format/SKILL.md: ID prefix table + category list extended - tests: schema happy/reject paths, consolidator id assignment + matrix population, renderer column/chip coverage - fixture: tests/fixtures/reports/v3-call-tree.json No skill-body changes (Stream B), no agent BP edits (Stream C), no version bump (Stream C does final bump after all streams land). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… lint Stream B of reviewer-rigor PR. Two reviewer-rigor threads: Call-tree inspection: - skills/grumpy-review/references/call-tree-walk.md: NEW authoritative methodology — outcome contract, when-to-run trigger, rank-and-walk-top-10 algorithm, tool-probe order, walk caps (depth 5, callers 200, 60s/function), per-caller judgement rubric, finding emission shape with mandatory `Walked via: <tool>` line. - skills/grumpy-review/SKILL.md: new "Call-tree inspection" subsection delegating to the reference doc. - skills/review-pr/SKILL.md: inheritance note (delegated via grumpy-review). - skills/check-pr-comments/SKILL.md: trigger walk when a resolved comment references a function modified in resolution commits. Ephemeral-ID enforcement: - skills/coding-best-practices/SKILL.md: new Cross-Cutting Rule banning transient review-finding IDs in committed artifacts; documented allow-list (ADR/RFC/CWE/CVE/OWASP/GHSA/GH refs/TODO/test-spec IDs); rule-of-thumb (regenerated JSON IDs = forbidden, committed-doc IDs = fine). - scripts/lint_ephemeral_ids.py: NEW deterministic lint. Imports CATEGORY_PREFIX from consolidate_reports to prevent drift. JSON or text output, --diff mode for unified-diff + lines, always exit 0. - skills/grumpy-review/SKILL.md + skills/review-pr/SKILL.md: wire the lint step, instruct LLM to dismiss in-skill example matches (no markers per design decision). - allowed-tools updates for the walk + lint binaries on the three reviewer skills. Tests: - tests/test_lint_ephemeral_ids.py: positives across all 13 consolidator prefixes, negatives across allow-list (ADR/RFC/CWE/CVE/OWASP/GHSA/etc.), word-boundary edge cases, --diff mode parsing, prefix-source-of-truth assertion. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- lint_ephemeral_ids: skip '\ No newline at end of file' diff marker so it no longer advances new_line (off-by-one on added matches after a no-newline final line); add regression test - grumpy-review SKILL §3: add call_tree category + CALL- prefix to the generic finding-format block for consistency with the call-tree section - call-tree-walk: fix ranking example to emit producer risk/impact/scope floats instead of coordinator-derived severity (producer-contract compliance) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
PR descriptions now open with a "Why this PR exists" section (problem, reproduction/threat scenario, blocking relationship) ahead of What/Testing/ Breaking/Checklist/Attribution. The skeleton lives in one place — git-and-github §Creating a PR — and push delegates to it (no duplication). Pinned by tests/test_pr_body_template.py. Bumps plugin to 4.4.0. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ive bits
Producers (esp. check-pr-comments) emit per-finding risk/impact/scope floats
but no integer severity and never route through the coordinator, so findings
rendered INFO and severity_counts/severity_category_matrix stayed zero. Fix it
where it bites — at render time — and tidy up the schema while we're in here.
Fix 1 (HIGH): new scripts/severity_util.py is the single source of truth for
the OWASP band table, per-finding severity derivation, and the stats/matrix
builder. consolidate_reports imports from it and keeps the legacy private
names (_derive_overall/_derive_severity_int) as re-export aliases. The renderer
gains a normalization pass (markdown/html/triage/pdf): it derives each finding's
integer severity from the floats when absent and recomputes severity_counts +
severity_category_matrix when the supplied counts are missing or all-zero —
never clobbering non-zero supplied counts. triage inherits the fix via
_build_html_context.
Fix 2 (MED): schema accepts optional finding author_type (bot|human) so
check-pr-comments output validates.
Fix 3 (MED): permalink commit now derives from `git rev-parse @{u}` with a
HEAD fallback (check-pr-comments, report-format, grumpy-review) so links resolve
on GitHub instead of 404-ing on an unpushed HEAD.
Fix 4: schema 3.1.0 additive bits — schema_version accepts 3.0.0 and 3.1.0,
metadata.report_type adds pr_audit, finding_section gains optional verdict
(PASS|FAIL|NEEDS_REVIEW). severity stays optional on findings.
Schema -> 3.1.0, plugin -> 4.3.0. New tests: test_severity_util.py, schema
additive cases, render regression (floats-only report renders HIGH + non-zero
counts), re-export parity, and an end-to-end producer->validate->render check
in the shell pipeline. 357 passed; black/ruff clean; plugin validates.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… tests
Completes PR-A. The earlier commit shipped schema 3.1.0, severity_util, and the
consolidate_reports re-export; this lands the renderer-side fix plus the
remaining producer/test/doc changes.
- generate_review_report.py: add _normalize_report() — derive each finding's
integer severity from risk/impact/scope when absent, and recompute
severity_counts + severity_category_matrix only when supplied counts are
missing/all-zero (never clobbering non-zero ones). Invoked from
render_markdown, _build_html_context (html + triage), and render_pdf.
triage_server.py inherits via the triage format, no change needed.
- check-pr-comments / report-format / grumpy-review: permalink commit now uses
git rev-parse @{u} with a git rev-parse HEAD fallback so links resolve on
GitHub; check-pr-comments schema_version bumped to 3.1.0 (3.0.0 still accepted).
- consolidate_reports.py: drop unused build_severity_stats import.
- tests: test_severity_util.py (band mapping via derive_severity_int for exact
boundaries — the float mean rounds at edges), test_render_v3.py regression
(floats-only report renders HIGH + non-zero counts in md/html, supplied counts
preserved), test_report_pipeline.sh end-to-end producer->validate->render,
test_consolidate_reports asserts against cr.SCHEMA_VERSION not a frozen literal.
- CHANGELOG: 4.3.0 section.
384 tests pass; black/ruff clean; plugin validates; shell pipeline 26/0.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The coordinator has its own compute_statistics; only the renderer uses the shared build_severity_stats. Removing the dead import clears the last ruff F401. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Tighten review-pr Pass C (promise verification) per TODOs 8c17d019 and 4463333d (F-1..F-5, F-7). DOC-ONLY in skills/review-pr/SKILL.md plus tests; no schema changes (PR-A landed pr_audit + finding_section.verdict already). - Fenced-body unwrap: strip+dedent a wholly-fenced PR body before the column-0 Summary/Out-of-scope regexes; emit one INFO "PR body unparseable" finding instead of silently skipping when nothing parses. - Clean-pass shape: findings:[] plus one INFO "PR self-description verified". - Compound titles: split on commas/em-dashes, verify per-topic, majority-hits. - Undocumented-change: keep >=50 LOC trigger, define "mentioned" precisely (keyword overlap with a Summary bullet or field-ownership-table row). - Summary-heading precedence: ## Summary > ### Summary > ## What changed. - Optional finding_section.verdict (PASS/FAIL/NEEDS_REVIEW) + report_type. - code_snippets language: cross-ref report-format §code_snippets, not hard-coded. Tests: new tests/test_review_pr_passc.py (grep-assert the documented rules + a parser-mirror of the fenced-body dedent that exposes ## Summary) and tests/fixtures/pr-promises/synthetic-fenced.md (self-describing, carries the standard <!-- expected --> annotation). Full suite: 358 passed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Contributor
There was a problem hiding this comment.
Pull request overview
Documents Pass C v1.1 of the review-pr skill: closes ambiguities reviewers were resolving inconsistently and adds a fenced-body unwrap heuristic so PR descriptions wrapped in a single code fence are no longer silently skipped. Pure documentation/test/version change — no schema or executable behavior is introduced. Builds on the schema/enum additions landed in PR-A (pr_audit, finding_section.verdict).
Changes:
- Pass C SKILL doc: fenced-body unwrap + "PR body unparseable" fallback; clean-pass shape with an explicit INFO marker; compound-title majority rule; precise "mentioned" definition for undocumented changes; Summary-heading precedence; optional
verdict/report_type: pr_auditwiring; cross-reference for snippetlanguage. - New doc-regression test (
tests/test_review_pr_passc.py) with grep-asserts and a parser-mirror of the fenced-body unwrap, plus asynthetic-fenced.mdfixture. - Bumps plugin manifest to
4.5.0and adds the corresponding CHANGELOG entry.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| skills/review-pr/SKILL.md | Adds Pass C v1.1 rules (fenced-body unwrap, clean-pass shape, compound titles, undocumented-change definition, heading precedence, verdict/report_type wiring, language cross-ref). |
| tests/test_review_pr_passc.py | Grep-asserts the documented Pass C v1.1 rules and mirrors the documented fenced-body unwrap against a synthetic fenced body. |
| tests/fixtures/pr-promises/synthetic-fenced.md | New Pass C fixture demonstrating a wholly fenced PR body with the standard <!-- expected --> annotation. |
| CHANGELOG.md | Adds a 4.5.0 — 2026-05-29 entry summarising the Pass C v1.1 changes. |
| .claude-plugin/plugin.json | Bumps the plugin version from 4.3.0 to 4.5.0. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Bring the squash-merged PR #41 (commit 7fcc279, final reviewer-rigor incl. 2nd+3rd Copilot passes) into the severity-pipeline branch. #42 was cut from #41 at b76bf72 (first pass only), so its branch carried a stale copy of #41. Resolution rule: pure-#41 content -> take main (final, thrice-reviewed); #42 severity-pipeline additions -> keep ours; mixed files -> combine. 8 conflicts resolved: - .claude-plugin/plugin.json: ours (4.3.0) - CHANGELOG.md: combined (ours 4.3.0 above main's final 4.2.0; NNN placeholders, no LSP, lint-clean) - schemas/review-report.schema.json: ours 3.1.0 superset + main's CALL-/ call_tree id-prefix description - skills/check-pr-comments,grumpy-review,review-pr SKILL.md: combined (main's gtags/tree-sitter allowed-tools + $BASE_BRANCH; ours @{u} permalink) - skills/grumpy-review/references/call-tree-walk.md: main (LSP tier removed) - tests/test_schema_v3_strict.py: ours (union, #42 TestV31AdditiveFields) consolidate_reports.py + generate_review_report.py merged byte-identical to ours (call_tree rendering and on-the-fly severity never overlapped). Also normalizes black formatting on 6 pre-existing not-black-clean test files (pure style, no semantic change). 379 tests pass; plugin validate clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Three valid catches in the severity-pipeline code: - generate_review_report: _normalize_summary_statistics rebuilt severity_counts + severity_category_matrix but left total_findings stale, so the HTML KPI could disagree with the rebuilt counts. Realign total_findings from the rebuilt stats (same all-zero/missing branch, so a hand-supplied non-zero stats block is still never touched). - generate_review_report: corrected the _normalize_summary_statistics docstring — the real trigger is "counts missing/all-zero AND >=1 finding exists" (floatless findings count as INFO), not "a finding carries a derivable severity". - consolidate_reports: ACCEPTED_SCHEMA_VERSIONS now derives from the schema's schema_version enum (single read shared with SCHEMA_VERSION) instead of a hard-coded set, so it can't drift on the next schema bump. _read_schema_version() preserved as a thin newest-entry wrapper. Regression guards: anti-drift assert (ACCEPTED_SCHEMA_VERSIONS == set of schema enum) and a total_findings-realignment test that fails without the fix. 382 tests pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Three follow-on findings on the on-the-fly severity normalization: - generate_review_report: _normalize_finding_severities now computes and stores overall_severity (mean of risk/impact/scope) when missing/invalid, then derives the integer band from that value — instead of filling only the integer severity. Producer-shape reports now populate the Markdown (overall=...) suffix and the HTML data-overall "Sort: overall" key. A finding with a valid int severity but no overall_severity still gets overall_severity filled; both already-valid values are left untouched. - generate_review_report: import derive_overall/derive_severity_int from severity_util (derive_finding_severity stays in severity_util, still used by _effective_severity; the renderer no longer imports it). - consolidate_reports: remove the now-dead _read_schema_version() wrapper — SCHEMA_VERSION reads _SCHEMA_VERSIONS[-1] directly since ACCEPTED_SCHEMA_ VERSIONS became schema-enum-derived. Two regression tests pin the overall_severity fill (floats-only finding + int-severity-without-overall). 384 tests pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
# Conflicts: # .claude-plugin/plugin.json # CHANGELOG.md
# Conflicts: # .claude-plugin/plugin.json # CHANGELOG.md
PR-C's own test file (from f10308d) failed black --check in this environment — assert line-wrapping only, no logic change. Surfaced during the reviewer-rigor/severity-pipeline merges; brings the git diff main...HEAD set fully clean under black. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
# Conflicts: # .claude-plugin/plugin.json # CHANGELOG.md
Rename ambiguous loop var l->ln and apply black to #44's new Pass C regression test, carried in with the feat/pr-why-template merge. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
# Conflicts: # .claude-plugin/plugin.json # CHANGELOG.md
…+ tighten fence parser Three valid Copilot findings on PR #44: - Pass C's "PR self-description verified" (clean pass) and "PR body unparseable" markers were labeled INFO but specified scope=1.0. With scope pinned at 1.0 the mean (risk+impact+scope)/3 floors at 1/3, so the band table can never reach INFO (needs <0.1) — both derived to MEDIUM. Fix per the check-pr-comments RESOLVED convention: informational Pass C findings use scope=0.0 (no actionable diff work). "verified" -> 0.1/0.1/0.0 -> INFO (band 1); "unparseable" relabeled LOW -> 0.2/0.2/0.0 -> LOW (band 2). Documented the Pass C scope=0.0 exception (scope=1.0 applies only to axis 1-3 mismatches). - Tightened the fenced-body unwrap's closing-fence check (SKILL spec + the test's parser-mirror): the closer must be the same fence character, length >= the opener, and contain only fence chars — previously a 3-char closer unwrapped a 4-char opener and "```python" trailing text was accepted. Guards added: a band-derivation test pinning the documented floats to the severity_util bands (INFO/LOW), and fence cases for the 4-vs-3 mismatch and trailing-text-after-fence. 408 tests pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Copilot follow-on on PR #44: the 4.5.0 CHANGELOG entry still described the "PR body unparseable" Pass C fallback as INFO, but the SKILL fix relabeled it LOW (scope=0.0, risk≈impact≈0.2 -> mean ≈0.133 -> band 2). Align the changelog. The "PR self-description verified" finding correctly stays INFO. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…t comment Copilot's 4th pass on PR #44 flagged that the scope=1.0 rationale overstated "MEDIUM". Verified ground truth: at scope=1.0, {0.1,0.1,1.0} derives to LOW(2) (IEEE-754 puts (0.1+0.1+1.0)/3 = 0.39999... just under the 0.4 cutoff), and {0.2,0.2,1.0} to MEDIUM(3). The explanatory text in both the SKILL exception note and the test comment wrongly said both inflate to MEDIUM. - SKILL: reword the exception note to state the real means/bands (LOW + MEDIUM). - test: rename test_scope_one_would_inflate_to_medium -> test_scope_one_lifts_above_documented_bands; correct the comment and document the IEEE-754 boundary so the intentional ==2 isn't mistaken for a typo. Assertions unchanged (both verified correct). The Copilot complaint that the ==2 assertion "will fail" was a false positive — it's the genuine float result. 408 tests pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why this PR exists
Problem: review-pr Pass C (promise verification) extracts
## Summary/## Out of scopewith column-0-anchored regexes and leaves several judgment calls undocumented — so it silently misses sections in some PR bodies and different reviewers resolve ambiguities inconsistently.What breaks without it: Author wraps the whole PR description in a ``` code fence (common when pasting from an editor that auto-wraps) → every section header sits inside the fence → Pass C's regex matches nothing → the promise check silently no-ops, reporting a clean pass it never actually performed. Plus: compound titles, the "undocumented change" threshold, and multi-heading precedence were all left to per-run judgment.
Blocking relationship: Originally the top of the stack (
feat/passc-v1.1→feat/report-pipeline-severity→feat/reviewer-rigor), depending on thepr_auditreport_type andfinding_section.verdictenum from PR-A (#42). #41, #42, and #43 are now all merged, so this PR has been retargeted tomain(which carries those schema additions) andmainis merged in — it stands alone and isMERGEABLE. The body itself follows the "Why this PR exists" template (#43, dogfood).What was done
All in
skills/review-pr/SKILL.md(Pass C), doc-only — no schema changes:findings: []plus one INFO "PR self-description verified" so a clean pass leaves visible evidence it ran.report-format§code_snippets for allowedlanguagevalues instead of hard-codingdiff.## Summary>### Summary>## What changed, first match wins.verdict(PASS/FAIL/NEEDS_REVIEW) andmetadata.report_type: pr_audit(both from PR-A, now inmain).Resolves TODOs
8c17d019+ the4463333ddoc gaps (F-1…F-5, F-7 wiring).Testing
pytest tests/→ 401 passed (whole suite, post-merge ofmain)black scripts/ tests/+ruff check scripts/ tests/cleanclaude plugin validate→ exit 0tests/test_review_pr_passc.py(grep-asserts all six SKILL additions + a parser-mirror proving dedent exposes## Summary); newtests/fixtures/pr-promises/synthetic-fenced.md.Breaking changes
None. Documentation change only.
🤖 Co-authored by Claudius the Magnificent AI Agent