Pre-release hardening: shared-package extraction, canonical verdict contract, process supervision + DB integrity#49
Conversation
…ible vendor ids (issue #43) Round-1 SF4 added a log-and-drop gate in recordSessionId for syntactically implausible vendor session ids, but only the valid-id / drift / idempotency paths were tested. Pin the reject branch: an implausible id is not persisted (column stays NULL), warns at most once per execution, and a later valid id still lands. Co-Authored-By: claude-flow <ruv@ruv.net>
…y packages Reverse the inverted dashboard→cli dependency by moving the lower layers out of the cli app into two source-only shared packages, so apps depend on shared libraries and never on each other. - @open-code-review/persistence: db + state + vendor-resume + test-support + runtime-checks, kept in one package because db and state are mutually recursive and the db connection-cache singleton must be a single module instance. - @open-code-review/config: models + runtime-config + team-config. Both mirror the @open-code-review/platform model — private, version 0.0.0, every exports condition points at ./src/*.ts (no build.mjs, no dist), declared as devDependency workspace:* and inlined by esbuild — so they are excluded from the release set and never published. cli and dashboard import sites are repointed to the new package subpaths; build, tsconfig, vitest, and project.json wiring updated to match. Co-Authored-By: claude-flow <ruv@ruv.net>
Model the review verdict as a closed 3-state merge gate (APPROVE / REQUEST CHANGES / NEEDS DISCUSSION), keeping residual work (follow-ups / suggestions) in finding categories where it is already normalized, so the headline verdict and the finding list can never contradict each other again. - platform: canonical verdict vocabulary (verdict.ts) and a single pure round-count derivation (counts.ts) on Node-free ./verdict and ./counts subpaths, re-exported from the barrel. - cli: fail-fast validation at complete-round (verdict enum, min title length, directional synthesis_counts cross-check) — lands in the moved round-meta.ts. - dashboard: read-time normalizeVerdict at the ingestion boundary, a 3-state badge with a subordinate residual-work chip, findings-table loading/empty/degraded states, and the D1 guard so final.md presence maps to synthesis (never fabricated terminal completion). - agents: verdict vocabulary unified across skill references and the final template; synced to .ocr. Fixes the accept_with_followups off-vocabulary bug. Fix-forward, no destructive migration. Co-Authored-By: claude-flow <ruv@ruv.net>
A dashboard-spawned review could finish its work yet wedge alive for 44+ minutes while the database grew to ~298 MB. Finalization hinged on stdio EOF that a leaked grandchild held open forever, the parent row was never heart-beaten, nothing reaped the escaped tree, and the markdown writer appended duplicates against a NULL-defeated unique index. - Supervision: detached spawns are unref'd and write to per-execution log files (FileTailer streams them) instead of OS pipes; finalization is driven by the vendor result event plus a per-execution watchdog, never by stdio EOF. A cross-platform reapTree kills the whole descendant tree on cancel, watchdog, and singleton takeover. finishExecution is first-wins idempotent (finalizer.ts) with cancel-wins applied centrally. - Decomposition: the command-runner god class split into process-registry, spawn-markers, prompt-builder, watchdog, and finalizer leaf modules. - Liveness: parent-row heartbeat on output activity + supervisor tick. - DB integrity: explicit UPDATE-or-INSERT markdown writer; operator maintenance surfaced through notice events and api-types. - Singleton: a live prior dashboard is reaped and taken over rather than coexisting on an incremented port; startup reaps orphan tmp/exec-log files. Co-Authored-By: claude-flow <ruv@ruv.net>
Off-CI the Nx Playwright preset runs fullyParallel with workers=undefined (one per core) and retries=0, pointing several browser contexts at the one shared Vite dev server. Concurrent cold loads race Vite's dependency optimizer (which forces full-page reloads and emits transient console errors), and the auth spec waited on networkidle, which never settles because the dashboard holds a persistent socket.io connection. Any one transient event failed a worker, and retries=0 turned that into a hard task failure. Serialize against the single shared server (fullyParallel:false, workers:1) — the correct model for a single stateful resource, not a retry-masked band-aid (retries stay 0 so real flakes surface) — and replace the networkidle wait with the deterministic auth-resolved signal. Co-Authored-By: claude-flow <ruv@ruv.net>
…odels Trailing import-path update from the persistence/config extraction. Co-Authored-By: claude-flow <ruv@ruv.net>
…oposals OpenSpec proposals, tasks, design, and spec deltas for the three changes landed in this batch (add-canonical-verdict-contract, add-process-supervision-and-db-integrity, refactor-extract-shared-packages), plus the reconciled cross-references in spec.md, the canonical sqlite-state spec, and CLAUDE.md updated for the new package layout and release process. Co-Authored-By: claude-flow <ruv@ruv.net>
Apply the spec deltas to the canonical specs (new package-architecture capability; cli, dashboard, review-orchestration, session-management, and sqlite-state updates) and move the three completed changes to openspec/changes/archive/. Co-Authored-By: claude-flow <ruv@ruv.net>
Extends the shipped round-metadata validation with a verdict↔blocker-count direction check bound to resolveRoundCounts().blockerCount (deduplicated): APPROVE⟹0 blockers, REQUEST CHANGES⟹≥1, NEEDS DISCUSSION unconstrained. Synthesizer produces a consistent pair; dashboard adds a render-time mismatch hint for legacy contradictory rows. Fix-forward, no migration. Co-Authored-By: claude-flow <ruv@ruv.net>
Adds the missing recovery owner for an incomplete, abandoned mid-pipeline run (the Review #146 class): a stranded-run predicate, forward-resume from the event-sourced current_phase (never regress), a single-writer session_resumed lease (metadata-discriminated, no taxonomy change), a cap→non-success close via session_auto_closed_stale, and two tiers (baseline skill re-invoke on all hosts; dashboard auto-resume adapter-gated). Spans session-management, sqlite-state, cli, review-orchestration, dashboard, config. Design hardened by an adversarial red-team pass. Co-Authored-By: claude-flow <ruv@ruv.net>
complete-round now cross-checks the recorded verdict against the deduplicated blocker count (resolveRoundCounts().blockerCount): APPROVE requires 0 blockers, REQUEST CHANGES requires >=1, NEEDS DISCUSSION unconstrained. Rejects a contradictory pair with SCHEMA_INVALID (exit 7), writing nothing. Uses the deduplicated count so it never contradicts the synthesis_counts dedup rule. - round-meta.ts: directional check + tests (incl. dedup-to-zero accept) - verdict-banner.tsx: non-destructive legacy verdict/finding mismatch hint + test - agents/* final-template.md + workflow.md: synthesizer directional guidance, synced to .ocr - update two existing fixtures to the tightened contract (REQUEST CHANGES w/ only should_fix → APPROVE) Implements openspec/changes/enforce-verdict-count-direction. Co-Authored-By: claude-flow <ruv@ruv.net>
Adds the persistence + config foundation for recovering an incomplete, abandoned mid-pipeline run (the Review #146 class): - forward-resume.ts: event-sourced derivation (currentPhase-based remaining phases, next_action forward_resume|abort_or_fresh), single-writer session_resumed lease (metadata-discriminated, no phase/round column, append-before-spawn cap counting, TTL+phase-transition renewal), owning-turn liveness, and the cap-exhaustion guarded close via session_auto_closed_stale (no taxonomy change) - projection.ts: lifecycle fold ignores forward_resume leases (no phase regression) - index.ts stateInit: refuse re-opening an active, incomplete session (begin would reset it to context); stateStatus: optional resume config → forward_resume/ abort_or_fresh + remaining_phases + attempts_remaining - config: forward_resume_max_attempts (2) + forward_resume_lease_seconds (1800) - forward-resume.test.ts: lease single-writer/cap/no-regress, stranded derivation, cap-close, begin refusal, status integration Part of openspec/changes/add-stranded-run-forward-resume. Co-Authored-By: claude-flow <ruv@ruv.net>
…rd-resume - 'ocr review --resume' classifies via stateStatus(forward-resume config) and only resumes a stranded mid-pipeline run forward from current_phase; acquires the single-writer lease (cap-bounded), performs the non-success close on exhaustion, and hands off to the baseline skill path when no vendor id is captured (work preserved). Adapter path resumes the captured vendor session. - 'ocr state status' passes the forward-resume config so a stranded run reports next_action forward_resume/abort_or_fresh + remaining phases + attempts left. Part of openspec/changes/add-stranded-run-forward-resume. Co-Authored-By: claude-flow <ruv@ruv.net>
… guidance workflow.md (synced to .ocr): Phase 0 gains a forward-resume control loop (act on 'ocr state status --json' next_action; continue forward from current_phase; never regress; don't call 'begin' on an active incomplete session) and a top-level host-neutral nudge to drive phases 4→7 to complete-round within one turn. Part of openspec/changes/add-stranded-run-forward-resume. Co-Authored-By: claude-flow <ruv@ruv.net>
Server: forward-resume-sweep detects stranded mid-pipeline runs (active, no
terminal artifact, POSITIVE death evidence — PID-confirmed dead or ended, never
a stale heartbeat) and, reusing the same CLI primitive a human would
('ocr review --resume'), triggers recovery; cap-exhausted runs are driven to the
non-success close. Wired into the startup + periodic sweeps. PID-confirmed death
is the sweep's liveness authority, so it uses the cap-only stranded action
(strandedActionByCap) rather than the heartbeat-gated derive.
Client: ResumeCard gains an 'exhausted' variant (Start fresh / Mark abandoned →
guarded 'state finish --abort' via the socket command runner) and a tested
next_action→variant mapping. forward_resume reuses the existing Continue-here +
terminal-handoff affordances.
Part of openspec/changes/add-stranded-run-forward-resume.
Co-Authored-By: claude-flow <ruv@ruv.net>
…ot errors Updates the e2e to the intentional behavior change: a stranded run with no captured vendor session is forward-resumed via the baseline skill handoff (exit 0) rather than rejected. Ends the agent instance so the run is a dead, forward-resumable mid-pipeline run with no resumable vendor conversation. Co-Authored-By: claude-flow <ruv@ruv.net>
Mark implemented tasks complete and reconcile descriptions with the as-built implementation (derivation in persistence not platform; config warn-and-fallback; no-vendor-id skill handoff; sweep death-evidence gate). Remaining unchecked items are honest external-dependency proof obligations (4-host live run, #146 live recovery) and one unattended-adapter-drive hardening follow-up. Co-Authored-By: claude-flow <ruv@ruv.net>
Folds both deployed changes into the live specs and moves the proposals to openspec/changes/archive/2026-06-15-*: - enforce-verdict-count-direction: cli/dashboard/review-orchestration deltas - add-stranded-run-forward-resume: cli/config/dashboard/review-orchestration/ session-management/sqlite-state deltas All 12 live specs validate --strict. Co-Authored-By: claude-flow <ruv@ruv.net>
Resolves the spec-only review of hotfix/pre-release-review (REQUEST CHANGES, all wording fixes — no code change; the implementation already matched). Blockers: - package-architecture: real Purpose (was TBD); also fix config + review-map TBDs - Node-free subpath: defined as a normative package-architecture requirement, referenced from sqlite-state - event-log count vocabulary: retire stale critical/major/nitpick_count, use the canonical category counts the code actually writes + defer to the shared helper - 'owning turn ended': clarified as caller-relative (human re-invocation is the takeover signal) so baseline forward_resume fires - cap-exhaustion close: CLI writes it on the no-daemon path (matches code); writer responsibility assigned per tier Should-fixes / suggestions: lease write-side/read-side split; wedged-vs-stranded glossary + terminology alignment; discriminated-union (kind/reason) doc; canonical 'positive death evidence' + 'canonical round blocker count' + CONTROL-prompt definitions; SHALL NEVER→SHALL NOT; OpenCode exemption reframed as a capability; complete-round self-heal source + at-commit-time qualifier; single-writer safety scenario; graduation edge cases + shared-package membership list; lease TTL default named (1800s). Deferred the explicit requirement-family refactors (review marked them non-blocking). All 12 specs pass openspec validate --strict. Co-Authored-By: claude-flow <ruv@ruv.net>
Update: forward-resume + verdict-direction implemented, archived, and round-1 review addressedThis adds two OpenSpec changes (now archived into the live specs) on top of the existing pre-release hardening, then resolves the round-1 spec review. What was built1. Forward-resume of a stranded mid-pipeline review (the Review #146 class:
2. Directional verdict↔blocker-count consistency at Tests: persistence/config/cli/dashboard unit suites + cli-e2e green; Round-1 review feedback — addressedThe review was spec-only (REQUEST CHANGES, 5 blockers). Corroboration confirmed every blocker was a spec-wording defect; the implementation already matched — so the fixes are spec text only, no code change. Blockers (all 5):
Should-fixes / suggestions: lease write-side/read-side split; wedged-vs-stranded glossary + terminology alignment; discriminated-union Deferred (review marked non-blocking): the three requirement-family refactors (split Resume Flag / Forward-Resume family / group validation scenarios) — prose is correct; deferred to avoid churning cross-references before the release tag. Honest carve-outs (tracked in the change's tasks.md): the 4-host live forward-resume proof and live #146 recovery are manual obligations (need all four vendor CLIs); fully-unattended non-interactive adapter-prompt drive is a hardening follow-up. The guaranteed-unattended behaviors (detection, cap-close, lease, surfacing) are implemented and tested. |
Genuinely worked the should-fix backlog from the approved code review. All verified: persistence/platform/cli/dashboard typecheck + unit suites green. - SF#2: pin Playwright retries:0 explicitly (Nx preset defaulted to 2 on CI, hiding real flakes exactly where they matter) - SF#3: first-wins CAS (AND finished_at IS NULL) on execution-tracker.finish so a second finish() (close + error paths) can't clobber the recorded exit/output - SF#4: contract test asserting the platform /verdict + /counts subpaths stay Node-free transitively (guards the primed browser-bundle trap) - SF#5: delete final-parser's duplicate KNOWN_VERDICTS alias table; delegate alias→canonical to the shared normalizeVerdict, keep prefix-extraction off the shared CANONICAL_VERDICTS (one source of truth; behavior preserved) - SF#6: RoundMeta.verdict: string → CanonicalVerdict (write-boundary type safety; read DTOs stay string) - SF#10: exhaustive default on the watchdog-tick switch - SF#11: drop non-null assertions in prompt-builder reviewer parse - Sug#4: verdict rejection echoes the raw value, not the sanitized form - Sug#11/#12: CLAUDE.md — lead db+state co-location with the type cycle (cache singleton is the consequence); distinguish consumer devDependency vs a shared package's own runtime dependencies Deferred (review marked non-blocking backlog): SF#1 module-boundary lint (needs a net-new eslint/@nx/eslint toolchain + CI wiring — a dedicated infra change, not a hotfix code fix); SF#7/#8 large refactors (spawnAiCommand / filesystem-sync split, explicitly multi-PR); SF#9 comment-rot strip (opportunistic). Co-Authored-By: claude-flow <ruv@ruv.net>
Addressed round-1 code-review feedback (the APPROVE'd review)Although this review was APPROVE / 0 blockers, I worked the should-fix backlog genuinely — corroborating each against the actual code first. Implemented the safe, high-value items now; deferred the large refactors the review itself marked as backlog. Pushed in Implemented
Verified: Deferred (review marked non-blocking backlog — agree)
Re: clarifying questions
|
Summary
Pre-release hardening that lands three completed OpenSpec changes plus a test-infra fix on top of
main. The work reverses an inverted package dependency, makes the review verdict a single enforceable contract, and fixes a class of process/DB-integrity wedges observed in the dashboard. Each concern is a separate atomic commit; the OpenSpec proposals are archived into the canonical specs in the final commit.Scope: 186 files, +5878/−1186. No GitHub issues are closed by this PR (#42 cross-repo reviews is intentionally out of scope).
evolve-phase4-host-aware-spawningis intentionally untouched.What's included
1.
refact(shared)— extractpersistence+configshared packagesReverses the dishonest
dashboard → clidependency (dashboard imported 7 of cli's 8 library subpaths,cli/dbat 38 sites, while declaring cli only as a devDependency). The shared lower layers move out of thecliapp into two source-only packages so both apps depend on shared libraries and never on each other:@open-code-review/persistence—db/+state/+vendor-resume+test-support+runtime-checks, kept in one package becausedbandstateform a mutually-recursive type cycle and the connection-cache singleton must be a single module instance.@open-code-review/config—models+runtime-config+team-config.Both mirror the
@open-code-review/platformprecedent exactly —private, version0.0.0, everyexportscondition points at./src/*.ts(nobuild.mjs, nodist, nobuildtarget), declared asdevDependency: workspace:*, inlined by esbuild — so they are excluded from the release set and never published (no npm trusted-publisher changes needed). Direct cutover: no re-export shims, and the oldtest-support → ./index.jsexternal trick (issue #41) is deleted, not preserved. TheCLAUDE.md"S27 / 9th-subpath" graduation rule is replaced with a cause-based rule (a slice graduates when consumed across a package boundary).2.
feat(verdict)— canonical 3-state verdict contract enforced end to endFixes the
accept_with_followupsoff-vocabulary bug that rendered a meaningless "?" badge. Models the verdict as a closed 3-state merge gate (APPROVE / REQUEST CHANGES / NEEDS DISCUSSION), keeping residual work (follow-ups/suggestions) where it is already normalized — in finding categories — so the headline verdict and the finding list can never contradict each other../verdictand./countssubpaths.complete-round(verdict enum, min title length, directionalsynthesis_countscross-check).normalizeVerdictat ingestion, a 3-state badge with a subordinate residual-work chip, findings-table loading/empty/degraded states.final.mdalone — it maps to thesynthesisphase), D2 (complete-roundguaranteesround-meta.jsonregardless of--file/--stdin), D3 (one shared count derivation)..ocr.Fix-forward; no destructive migration.
3.
feat(dashboard)— process supervision + DB integrity hardeningAddresses a dashboard-spawned review that finished its work yet wedged alive for 44+ minutes while the DB grew to ~298 MB (FK-orphan rows + a NULL-defeated unique index that let one artifact accumulate 775 duplicates).
unref'd and write to per-execution log files (aFileTailerstreams them) instead of OS pipes; finalization is driven by the vendorresultevent + a per-execution watchdog, never stdio EOF. A cross-platformreapTreekills the whole descendant tree on cancel, watchdog, and singleton takeover.finishExecutionis first-wins idempotent with cancel-wins applied centrally.command-runnergod class split intoprocess-registry/spawn-markers/prompt-builder/watchdog/finalizerleaf modules.ocr db doctor/vacuum/prune) with snapshot-before-mutate.Incorporates the round-1 and round-2 multi-agent review remediations from PR #36 (watchdog liveness gating, invisible-char
\p{Cf}strip, NaN-bypass closure, shutdown reap grace, finalizer/sweep ownership boundary, etc.).4.
test(dashboard-ui-e2e)— eliminate a flaky e2eOff-CI the Nx Playwright preset ran
fullyParallelwithworkers=undefined(one per core) andretries=0, pointing several browser contexts at one shared Vite dev server; concurrent cold loads raced Vite's dependency optimizer (forcing full-page reloads + transient console errors) and the auth spec waited onnetworkidle, which never settles under the dashboard's persistent socket.io connection. Serialized against the single shared server (fullyParallel:false,workers:1) — the correct model, not a retry mask (retries stay 0 so real flakes surface) — and replacednetworkidlewith the deterministic auth-resolved signal.Also included
test(dashboard)Harden model/config strings that reach Windows shell:true spawns #43 capture-boundary reject-branch coverage (pre-existing on the branch).refact(config)trailing import-path repoint for the team-models e2e.Verification
All run green on the reconciled tree:
nx run-many -t typecheck(9 projects)nx run-many -t test(5 projects)nx run-many -t e2e(cli-e2e, dashboard-api-e2e, dashboard-ui-e2e) — incl. the de-flaked suite run 5× + the full cross-project matrixcli:build+dashboard:buildnx release --skip-publish --dry-run(computes v2.3.0 cleanly)openspec validate --strictfor all three changes before archivingRelease note
This PR does not cut a release. The v2.3.0 release remains paused pending explicit authorization; publishing happens only via the tag-triggered CI workflow after merge.
🤖 Generated with claude-flow