feat(ffi): Design-Z dispatch gate exports (prmi_present, prmi_l_pac) + equivalence harnesses#54
Conversation
|
Warning Review limit reached
More reviews will be available in 56 minutes and 3 seconds. Learn how PR review limits work. Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file). ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits. 🚦 How do rate limits work?CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability. For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (5)
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughAdds two C-ABI exports in ChangesFFI API and tooling
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
3c3e8f5 to
256b24f
Compare
7ef4814 to
cce75fe
Compare
|
@coderabbitai review |
✅ Action performedReview finished.
|
There was a problem hiding this comment.
Actionable comments posted: 11
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@prmi-sys/src/lib.rs`:
- Around line 196-203: Add an unwind fence around the body of prmi_l_pac so any
panic from Handle::as_ref(handle).idx.l_pac() is caught instead of crossing the
FFI boundary, while keeping the explicit NULL check that returns 0 as the
sentinel behavior. Update the prmi_l_pac implementation in prmi-sys/src/lib.rs
to preserve the existing contract for valid handles and ensure the non-NULL path
is wrapped in catch_unwind with a safe fallback return value.
- Around line 2283-2324: The PAC slice is constructed too early in prmi_present,
which can still create undefined behavior when pac_num_bases does not match the
index’s expected PAC length. Update prmi_present to validate pac_num_bases
against h.idx.l_pac() before calling from_raw_parts, and only build pac_slice
after that check passes. Also handle the zero-length case explicitly so
pac_num_bases == 0 returns an empty slice and allows NULL pac when no PAC data
is expected.
In `@prmi-sys/tests/ffi_present.rs`:
- Line 8: The new FFI export for prmi_l_pac is not covered by the present
boundary test, so extend the ffi_present test to import and exercise prmi_l_pac
alongside prmi_open, prmi_close, and prmi_present. Add assertions that a valid
handle passes the l_pac == n check and that the NULL handle case uses 0, so the
FFI boundary behavior is pinned for both success and null inputs.
- Around line 104-122: The invalid-args coverage in `ffi_present` is missing a
`pac_num_bases` mismatch case, so it does not verify the ABI length/range
contract for `prmi_present`. Add a memory-safe assertion using a
smaller-than-`n` `pac_num_bases` value while keeping the buffers valid, and
assert that `prmi_present` returns `-2`; use the existing `prmi_present` test
block as the place to extend this validation.
In `@prmi/examples/z_accept_gate.rs`:
- Around line 67-86: The per-read SMEM loop in the z_accept_gate harness is
using `collect_smems`, which creates fresh scratch state on every call and
causes avoidable allocator churn. Refactor the `smem_mns` closure to hold a
reusable `CollectScratch` per `LearnedIndex` and call `collect_smems_into`
instead, following the intended hot-path usage from `collect_smems_into` in
`collect.rs`. Update the repeated read-processing sites in this example to reuse
that scratch across calls rather than rebuilding it each time.
- Around line 134-137: The FASTQ parsing loop in z_accept_gate.rs is ignoring
missing `+` and quality lines, which lets truncated records be counted as valid.
Update the read-processing logic in the loop using `lines.next()` so that all
four FASTQ lines must be present before scoring a record, and stop/fail fast
when the `+` or quality entry is missing instead of discarding the `next()`
results. Keep the fix localized to the `while let Some(Ok(_h))` loop and its
handling of `seq` and subsequent line reads.
- Around line 46-49: The PRMI_CAP parsing in z_accept_gate.rs currently accepts
0 and negative values, which can make the accept-gate logic degenerate; update
the cap initialization logic to validate the parsed value in the code path
around the env var parse and reject any non-positive result by falling back to a
safe default or failing fast with a clear message. Use the existing cap variable
and the surrounding max_s < cap checks in the harness setup to keep the fix
localized.
In `@prmi/examples/z_present_dump.rs`:
- Around line 44-47: The FASTQ parsing loop in z_present_dump.rs is accepting
truncated records because it silently ignores missing or failed `+` and quality
lines after reading the header and sequence. Update the record handling in the
loop around the lines.next() reads so that all four FASTQ lines must be present
and valid before incrementing total or emitting any TSV row; if any of the four
reads fails, stop processing that record immediately instead of classifying it.
In `@scripts/z_aln_concordance.py`:
- Around line 42-66: The parse_sam function currently keys primaries only by
qname, so paired-end SAMs with two primary mates overwrite each other and
corrupt concordance. Update parse_sam to either reject paired-end reads
immediately when flag & 0x1 is set, or change the Primary storage and downstream
concordance logic to disambiguate mates in the key; use the parse_sam symbol and
the primaries/secondary_supp handling to locate the fix.
- Around line 159-167: The present TSV lookup in run() is treating missing
qnames as absent via present.get(q, False), which can hide contract drift.
Update run() to validate that the present map’s keys exactly match the SAM
cohort from full before building present_q and absent_q, and abort with a clear
error if there is any mismatch. Use the existing parse_present(), parse_sam(),
and run() flow to place the check right after loading the inputs.
- Around line 113-132: Move the secondary/supplementary count comparison in the
read-processing logic so it runs before any early exits in the branch that
handles unmapped and mapped-status differences. In the function that updates the
report counters, compare fast_ss.get(q, 0) and full_ss.get(q, 0) and increment
rep.secsupp_diff before the both_unmapped and mapped_status_diff continue paths,
so all reads are included regardless of mapping status.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: b835a145-9820-4da9-8e23-22fc139a138e
📒 Files selected for processing (5)
prmi-sys/src/lib.rsprmi-sys/tests/ffi_present.rsprmi/examples/z_accept_gate.rsprmi/examples/z_present_dump.rsscripts/z_aln_concordance.py
|
@coderabbitai review |
✅ Action performedReview finished.
|
|
@coderabbitai review |
✅ Action performedReview finished.
|
There was a problem hiding this comment.
Actionable comments posted: 5
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@prmi-sys/src/lib.rs`:
- Around line 2315-2323: Move the h.idx.l_pac() validation into the existing
catch_unwind boundary in the extern "C" entrypoint so any panic in the metadata
path is converted to the documented -3 instead of crossing the C ABI. Keep the
check in the same validated flow around the slice construction, but fence either
the l_pac() lookup itself or the entire pre-slice validation block inside the
catch_unwind used in this function.
In `@prmi/examples/z_accept_gate.rs`:
- Around line 43-60: The harness currently accepts any PRMI_PAC payload even
when it does not match the genome length, which can produce misleading
concordance results. In z_accept_gate.rs, after computing l_pac and confirming
the full and fast LearnedIndex values agree, validate the read pac buffer length
against the expected packed size derived from l_pac before constructing
PacEncoding::Packed. If the length is wrong, fail fast with a clear assertion so
mismatched or truncated sidecars are rejected before the harness runs.
In `@prmi/examples/z_present_dump.rs`:
- Around line 32-36: The example in z_present_dump.rs trusts PRMI_PAC blindly,
so add a preflight length check in the main flow before constructing
PacEncoding::Packed: compare the loaded pac buffer length against
fast.l_pac().div_ceil(4) and fail immediately with a clear error if they differ.
Use the existing LearnedIndex::open, std::fs::read, and PacEncoding::Packed
setup to locate the fix, and keep the validation alongside the
PRMI_FAST/PRMI_PAC loading logic so present_anchor never runs on mismatched
sidecar bytes.
In `@scripts/z_aln_concordance.py`:
- Around line 77-82: The present.tsv parsing in the row-loading logic should
reject bad input instead of silently coercing it. In the loop that populates
present from the TSV, validate that flag is only "0" or "1" before assigning,
and raise an error for any other token rather than treating it as absent. Also
detect duplicate qname entries and fail fast instead of allowing last-row-wins
behavior, so the parser in scripts/z_aln_concordance.py surfaces invalid or
edited rows immediately.
- Around line 170-187: `run()` currently only validates `present` against
`full`, so `compare()` can silently drop qnames when `fast` and `full` diverge
and inflate concordance. Add a cohort-drift guard in `run()` by checking
`set(fast) == set(full)` immediately after `parse_sam()` and before constructing
`present_q`, `absent_q`, and `all_q`, and raise a clear `ValueError` using the
`fast`, `full`, and `compare()` symbols when the qname sets differ.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 5478b5fb-3fdc-4c3f-afde-0a7d02090150
📒 Files selected for processing (5)
prmi-sys/src/lib.rsprmi-sys/tests/ffi_present.rsprmi/examples/z_accept_gate.rsprmi/examples/z_present_dump.rsscripts/z_aln_concordance.py
Export prmi_present(handle, read, read_len, pac, pac_num_bases) -> c_int, wrapping LearnedIndex::present_anchor: a one-probe locate of the read's leading 32-mer (present iff match_len==32), used by a tiered (Design Z) consumer to route off-target reads to the whole-genome fallback before paying a full prmi_collect_smems. Returns 1 present / 0 absent / negative error (-1 null, -2 invalid len, -3 panic); same (pac, pac_num_bases) ABI as prmi_collect_smems. cbindgen regenerates the header automatically. Adds tests/ffi_present.rs (in-ref present, absent, short/empty/all-N, null/invalid args). clippy -D warnings clean; workspace tests green.
z_accept_gate opens a tiered fast index + the whole-genome truth index and, per read, runs present_anchor then compares fast collect_smems to full (m,n,s)+positions to establish ground-truth identity, evaluating candidate accept predicates (present; +occ<cap; +full-coverage) and tallying false-accepts + served fraction. Also classifies divergent spans by (m,n): occ-reduced vs wholly-missing vs over-emit, and reseed-like(contained) vs read-boundary vs primary-interior. Finding (chr22, +halo keep-set): no zh-only predicate reaches zero false-accepts; divergence is purely under-emission (subset holds) of low-occ contained/reseed-like seeds. Establishes that byte-identical served seeding is not achievable with heuristic accept — motivating an alignment-equivalence evaluation. Requires --features spectrum-probe-count.
|
@coderabbitai review |
✅ Action performedReview finished.
|
|
@coderabbitai review |
✅ Action performedReview finished.
|
|
@coderabbitai review |
✅ Action performedReview finished.
|
There was a problem hiding this comment.
Actionable comments posted: 6
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@prmi-sys/src/lib.rs`:
- Around line 196-207: Update the FFI safety docs on prmi_l_pac so the
return-value contract is exhaustive: it should document both the NULL-handle
case and the panic-fallback case that also returns 0. Keep the note aligned with
the implementation in prmi_l_pac and its catch_unwind wrapper, and make the
public C ABI contract explicit that 0 is returned for both invalid NULL input
and any fenced panic during the l_pac metadata read.
In `@prmi/examples/z_accept_gate.rs`:
- Around line 84-104: The SMEM collection path in smem_mns still creates and
zero-fills a fresh Vec on every call, so reuse the output buffers instead of
only reusing CollectScratch. Hoist separate fast/full buffers for
collect_smems_into, keep them outside the closure, and make smem_mns return the
count or a slice/view from the reused buffer rather than allocating/truncating a
new Vec each time. Use the existing smem_mns closure and collect_smems_into call
sites as the anchor for the change.
- Line 136: The FASTQ harness currently accepts an empty input list in
z_accept_gate via the std::env::args().skip(1).collect() flow, which leads to
misleading all-zero metrics. Add an explicit validation in the argument-handling
path to reject empty inputs before proceeding, and return a clear error or
non-zero exit from the main gate logic when no file paths are provided.
- Around line 152-177: The FASTQ record loop in z_accept_gate should also
validate the header line before scoring, not just the `+` and quality lines. In
the `while let Some(h) = lines.next()` block, check the first line read into
`_h`/header is a valid FASTQ header (for example, starts with `@`) and fail fast
with a clear panic if it is not. Keep this check alongside the existing
`plus.starts_with('+')` and `qual.len() == seq.len()` validations so malformed
records are rejected consistently.
- Around line 289-296: The served-correct summary in the z_accept_gate output is
underreporting predicates that have false accepts because it prints only
p_acc_ok from the reporting loop. Update the formatting in the loop over
names/p_acc_ok/p_acc_bad to report served fraction as ok + false_accept, using
both values when computing the displayed total, while keeping the rest of the
summary output unchanged.
In `@prmi/examples/z_present_dump.rs`:
- Around line 57-85: The FASTQ parsing in z_present_dump.rs currently accepts
headers without a leading @ and can produce invalid or empty qnames. In the
record-reading logic where h is read and qname is derived, add an explicit
validation that the header starts with @ before splitting it, and fail fast with
a clear panic message if it does not. Keep the check near the existing
header/sequence/plus/qual validations so malformed records are rejected before
TSV join key generation.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: aa648825-6717-42a0-9d11-cf682daeefa0
📒 Files selected for processing (5)
prmi-sys/src/lib.rsprmi-sys/tests/ffi_present.rsprmi/examples/z_accept_gate.rsprmi/examples/z_present_dump.rsscripts/z_aln_concordance.py
|
@coderabbitai review |
✅ Action performedReview finished.
|
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@prmi/examples/z_accept_gate.rs`:
- Around line 46-49: The PRMI_CAP parsing in z_accept_gate.rs currently hides
misconfiguration by falling back to 10 on any parse failure. Update the cap
initialization logic so that it only uses 10 when std::env::var("PRMI_CAP") is
absent, and explicitly fails or panics when PRMI_CAP is present but cannot be
parsed. Use the existing cap variable setup in the harness to keep the behavior
localized and make malformed configuration surface immediately.
In `@prmi/examples/z_present_dump.rs`:
- Around line 47-54: The header is emitted in z_present_dump before the
input-argument validation, so a failed no-input run can still leave partial TSV
output. Move the println! for qname/present below the files.is_empty() assert in
main so the usage check runs first and only valid invocations write the header.
- Around line 48-54: In z_present_dump, add validation in the qname
derivation/output path before writing present.tsv so the join key is never empty
or duplicated across inputs. Update the main processing logic that reads the
FASTQs and emits rows to reject headers that produce an empty qname (for
example, a bare @) and to track seen qnames, failing fast on repeats instead of
continuing. Keep the existing CLI usage assertion, but make the per-record check
part of the same z_present_dump flow that prepares the join key for
scripts/z_aln_concordance.py.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: bea85e59-836a-4022-a526-9120287465b8
📒 Files selected for processing (5)
prmi-sys/src/lib.rsprmi-sys/tests/ffi_present.rsprmi/examples/z_accept_gate.rsprmi/examples/z_present_dump.rsscripts/z_aln_concordance.py
Add the harness that rescued Design Z under an alignment-identity bar: the seed-identity de-risk found that no fast-index accept predicate yields byte-identical seeding, so this stage compares final alignments instead. - prmi/examples/z_present_dump.rs: dump the present/absent partition for a FASTQ via present_anchor (the join key: only present reads are served by the fast path; absent reads route to the whole-genome fallback). - scripts/z_aln_concordance.py: diff two SAMs (fast vs full prmi seeding run through the same chain+extend), restricted to present reads. Concordance := primary POS+CIGAR; MAPQ deltas and secondary/supplementary differences are reported, not failed. Includes a --selftest.
A consumer pairing a prmi sidecar with a reference can only sanity-check the pairing via sa_num == 2*l_pac+1, which a tiered (filtered-SA, Design Z) sidecar deliberately violates (its sa_num is smaller). Expose the sidecar's recorded l_pac so the consumer can validate genome identity (sidecar l_pac == FM-index l_pac) independently of sa_num, and thus accept both full and tiered sidecars while still rejecting a sidecar built for a different reference.
A newer nightly rustfmt (the unpinned CI fmt toolchain) reformats this example; apply it so the fmt check passes. Formatting only.
|
@coderabbitai review |
✅ Action performedReview finished.
|
Design-Z stack 3/6 — base:
feat/z-tiered-sa(#53).Design-Z dispatch gate FFI:
prmi_present(first N-free 32-mer window pre-reject — one cheapmem_searchlocate routing off-target reads to the fallback) andprmi_l_pac(sidecar/reference validation). Adds thez_accept_gate/z_present_dumpequivalence harnesses and az_aln_concordance.pyscript.Dual-reviewed (coderabbit --agent + local CR) — no findings. green.
Summary by CodeRabbit
prmi_present, covering present/absent scenarios and validation/error handling.