Skip to content

Add RSSI protocol regression coverage and RTL fixes#1427

Draft
bengineerd wants to merge 96 commits into
pre-releasefrom
rssi-tests
Draft

Add RSSI protocol regression coverage and RTL fixes#1427
bengineerd wants to merge 96 commits into
pre-releasefrom
rssi-tests

Conversation

@bengineerd

@bengineerd bengineerd commented May 28, 2026

Copy link
Copy Markdown
Contributor

Description

Adds a comprehensive RSSI v1 cocotb regression suite for the SURF RSSI RTL, plus the narrow RTL fixes needed to make the implementation match the documented SURF/Rogue RSSI hardware profile.

Details

This RSSI-focused slice adds protocol references, checked-in cocotb wrappers, reusable RSSI Python helpers, directed leaf-FSM tests, integrated RssiCore tests, wrapper tests, PyRogue register-map alignment, and RSSI user documentation.

Test Suite

The new tests under tests/protocols/rssi/ cover the RSSI stack from leaf blocks through integrated endpoint pairs:

  • RssiChksum: checksum generation and validation against a Python one's-complement oracle, including reset/enable restart behavior.
  • RssiHeaderReg: ACK, DATA, NULL, RST, and SYN header byte layout, flags, sequence/ack fields, checksum placeholders, and SYN parameter packing.
  • RssiRxFsm: receive-side DATA/SYN/ACK/NULL/RST screening, checksum behavior, illegal flag rejection, unsupported EACK rejection, out-of-order and duplicate DATA behavior, SYN parameter staging, and checksum-disabled behavior.
  • RssiTxFsm: ACK/DATA/NULL/RST/SYN emission, sequence consumption, checksum insertion, checksum fault injection, multi-word DATA buffering/resend, cumulative ACK window release, and NULL suppression while data is outstanding.
  • RssiMonitor: retransmit timeout behavior, remote/local BUSY handling, periodic BUSY ACK cadence, and server null-timeout liveness rules.
  • RssiConnFsm: client/server connection setup, peer parameter validation, retry behavior, timeout closure, parameter mismatch rejection, and client RST behavior.
  • RssiAxiLiteRegItf: reset defaults, writable parameter readback, local parameter clamping, negotiated/status/counter/state readback, sequence/ack readback, and AXI-Lite DECERR behavior.
  • RssiCore: integrated client/server active open, negotiated segment-size readback, bidirectional payload delivery, DATA loss/corruption recovery, duplicate-free ACK/NULL perturbation, out-of-order recovery, sequence wrap, NULL keepalive acknowledgment, missing keepalive close, max retransmit close/RST, explicit close, and BUSY reporting from application backpressure.
  • RssiCoreWrapper: one-stream wrapper coverage in bypass-chunker and packetizer modes across multiple window and segment-size configurations.
  • RssiCoreWrapperMultiStream: packetizer2/depacketizer2 multi-stream routing, bidirectional two-stream payload delivery, and routed-payload recovery after a dropped RSSI DATA frame.

The checked-in wrappers under protocols/rssi/v1/wrappers/ flatten record-heavy RSSI interfaces for cocotb while keeping protocol stimulus and scoreboards in Python. The shared helper layer in tests/protocols/rssi/rssi_test_utils.py owns RSSI header builders/parsers, SYN parameter packing, checksum helpers, transport-frame capture, and reusable VHDL source lists.

The branch also adds protocols/rssi/README.md, local RSSI/RUDP/Rogue reference material under docs/plans/rssi-regression/references/, and task notes documenting the test plan, progress, RTL review input, and implemented RTL changes.

RTL Changes And Rationale

The production RTL changes are intentionally narrow and tied to failures or contract gaps exposed by the new regressions:

  • RssiMonitor.vhd: server null-timeout liveness now refreshes only on received DATA or NULL, not standalone ACK/BUSY traffic. The RSSI protocol describes the server null timeout as detecting absence of DATA/NULL keepalive traffic; ACK/BUSY-only traffic should not keep a dead peer alive indefinitely.
  • RssiMonitor.vhd: local BUSY now generates an immediate ACK on assertion and periodic BUSY ACKs at Retransmission Timeout/2 while busy. This matches the RSSI flow-control guidance and keeps the peer transmitter from entering retransmission/RST while the local receiver is intentionally backpressured.
  • RssiCore.vhd: local BUSY now includes application-output pause and direct downstream TREADY backpressure in addition to the existing FIFO write-count threshold. This lets integrated cores advertise BUSY when the application sink is stalled before the peer transmitter advances into retransmit behavior.
  • RssiCore.vhd: output FIFO pause thresholds are clamped to at least 1 for small segment-size configurations. RssiCoreWrapper derives SEGMENT_ADDR_SIZE_G from MAX_SEG_SIZE_G; without the clamp, 64-byte and 128-byte wrapper configurations could elaborate with an illegal zero/negative AxiStreamFifoV2 pause threshold.
  • RssiConnFsm.vhd: peer SYN/SYN+ACK parameters are validated before negotiation converts them into integer window/buffer state. Zero outstanding-segment counts, zero timeouts, and sub-8-byte segment sizes are rejected or reproposed instead of causing illegal state or simulator range errors.
  • RssiConnFsm.vhd: retry wait counters now saturate at the retransmission timeout threshold. This preserves the existing retry/close decisions while preventing constrained-counter overflow at the timeout boundary in simulation.
  • RssiAxiLiteRegItf.vhd: software-written local parameters now clamp maxOutsSeg and timeout fields to legal nonzero ranges. This keeps AXI-Lite-programmed local parameters inside the same validity boundary enforced during peer negotiation.
  • RssiTxFsm.vhd: checksum fault injection now corrupts only the checksum field and applies consistently to ACK, NULL, DATA, and resend headers. The documented debug feature is a one-shot header-checksum corruption, not a full 64-bit header inversion that changes flags, lengths, sequence, or ACK fields.
  • RssiTxFsm.vhd: NULL generation is suppressed while the transmit buffer contains unacknowledged DATA/NULL/RST. NULL is idle keepalive traffic; allowing it to consume the next sequence while earlier DATA is outstanding can let a receiver advance past the lost DATA and then reject the DATA retransmit as old.
  • RssiRxFsm.vhd: DATA legality checks now use the current decoded header flags and reject DATA without ACK, DATA with BUSY/NULL/RST/EACK semantics, and unsupported non-SYN EACK segments. This avoids accepting invalid DATA because of stale registered flag state and makes the unsupported EACK path drop explicitly.
  • RssiRxFsm.vhd: SYN parameters are staged and committed only after the full SYN passes checksum, length, flag, and frame-boundary checks. Malformed SYN frames can no longer partially update rxParam_o before being dropped.
  • RssiRxFsm.vhd: DATA EOF length calculation, payload write-data staging, and application-output read timing were corrected for the registered payload RAM path used by integrated RssiCore. This fixes one-word payload length/read timing errors observed only in the integrated core path.
  • RssiRxFsm.vhd: duplicate DATA is dropped before entering the payload-buffering state. Duplicate or out-of-order DATA is not queued in the SURF RSSI hardware profile; recovery comes from retransmission of the missing in-order segment.

The PyRogue RSSI model in python/surf/protocols/rssi/_RssiCore.py is updated to align descriptions and writable parameter ranges with the RTL-visible register behavior.

Validation

The branch records focused validation in the RSSI task notes, including local runs of the RSSI cocotb tests, syntax checks, VHDL style checks for edited RTL/wrappers, and focused integrated RssiCore/wrapper tests for the bugs and contract points above.

Related

Depends on #1426, which depends on #1425.

bengineerd and others added 30 commits May 2, 2026 23:12
- Introduced tests for reset behavior in FIFO and RAM modules, ensuring proper handling of pending entries and state clearing.
- Added tests for simultaneous read/write operations in FIFO and RAM, verifying correct data handling during collisions and near-full conditions.
- Implemented tests for starvation resistance in arbiters, ensuring fair request handling under contention.
- Enhanced watchdog tests to cover chattered keepalive sequences, ensuring timeout behavior is correctly implemented.
- Added cross-port collision tests in dual-port RAM, verifying correct data visibility and handling during simultaneous writes.
- Introduced burst read gap tests in synchronizer FIFO, ensuring proper data transfer during paused reads and reset conditions.
… tests

- Added a simple validVec active-lane request mask.
- Kept packet atomicity: selected lane is held through the packet and advances on TLAST.
- Added bounded idle-lane skipping: if the selected lane is empty but some active lane has data, the mux advances one lane per clock instead of parking indefinitely.
- Tightened the dual-lane RX test to assert no FSM error, the full six-word output, and both frame TLASTs.
…s, not just FSM/mux state. Previously the reset could restart parsing while stale overflow-era payload still drained ahead of recovery traffic.
test_CoaXPressRx.py: promoted four-lane short-frame/boundary tests and repeated single-line frame coverage; reclassified heavy overflow recovery checks under RUN_STRESS_TESTS=1.

test_CoaXPressRxHsFsm.py: promoted repeated single-line frame bench.

test_CoaXPressCore.py: made the RX backpressure counter test part of normal coverage with a workload that actually overflows data FIFO and asserts RxFsmErrorCnt stays zero.

Updated CoaXPress README and _meta docs to remove stale known-issue guidance.
… and require EOP before pulsing cfgMaster.

CoaXPressRxLane.vhd (line 468): heartbeat packets now validate repeated-byte payload words, CRC, and EOP before pulsing heatbeatMaster.
…real CRC.

Updated the CoaXPress README and _meta docs to remove the stale ACK limitation and document the remaining real gaps.
… states after the declared DSize payload words, instead of returning to idle immediately and accepting a new SOP early. The CRC covers stream ID, packet tag, size, and payload words. Payload still forwards as it arrives, so this enforces packet framing before the next packet rather than buffering and dropping bad payloads.
…ser errors into the existing rxFsmError output alongside the high-speed FSM error. That means malformed packet trailers, including bad stream CRC, now reach the existing core RxFsmErrorCnt software counter.

Updated tests cover:

leaf rxError pulse on bad stream trailer
top-level rxFsmError pulse and recovery after bad stream CRC
core RxFsmErrorCnt increment on bad stream CRC, with later clean-frame recovery
CoaXPressRxLane now emits event payload words on eventMaster.
CoaXPressRx crosses that event stream into cfgClk.
Existing eventAck/eventTag remains trailer-gated after CRC/EOP validation.
CXPoF bridge status:

/Q/ status via seqValid/seqData.
/E/ status via rxError/rxAbort.
HKP status via hkpValid/hkpData/hkpEop.
Existing reconstructed CXP word-stream behavior is preserved.
CoaXPressRxLane now uses an explicit distributed surf.SimpleDualPortRam as a bounded event payload store in CoaXPressRxLane.vhd. Event payload words are written into that RAM, then released on eventMaster only after CRC and EOP pass. Bad CRC events and oversized events do not leak payload.

Also updated test_CoaXPressRxLane.py to assert:

valid multi-word event payload is released after validation
bad-CRC event does not release payload
oversized event is rejected
parser recovers for a later clean event
…VHDL description header:

maxCount is intended to be programmed during init or reset.

If changed at runtime, assert rst afterward.

Discard output history for at least one newly configured delay interval before relying on dout.

Added a fuller module description covering RAM mode, address wrapping, en, DO_REG_G, and delay formula.

Also aligned test_SlvDelayRam.py with that contract: it now reprograms maxCount, resets, discards the post-reset history interval, then verifies stable traffic at the new delay.
Changed CoaXPressOverFiberBridgeRx.vhd (line 42) to add:

/Q/ sequence tracking with seqExpected, seqError, and seqErrorExpected
classified rxErrorCode causes for sequence mismatch, idle /E/, payload abort, bad control, overwrite, and malformed HKP
HKP structural parsing via hkpSof, hkpWordCount, hkpError, plus existing raw hkpData/hkpEop
Propagated the new status ports through CoaXPressOverFiberBridge.vhd (line 54) and tied them off in the GT wrapper instances.

Updated bridge tests in test_CoaXPressOverFiberBridgeRx.py (line 160) and test_CoaXPressOverFiberBridge.py (line 184),
…ing:

CXP reconstructed word K masks
CXPoF XGMII control masks
CXPoF SOP control bit positions/values
CXPoF low-speed payload control codes
CXPoF terminate suffix pattern
CXPoF RX error-code constants
Then replaced the inline literals in CoaXPressOverFiberBridgeRx.vhd (line 140) and CoaXPressOverFiberBridgeTx.vhd (line 114). I also mirrored the status constants into coaxpress_test_utils.py (line 56) and updated the bridge tests to use those names.
…de-up command decoder.

Changes:

Added package helpers/constants in CoaXPressPkg.vhd (line 92): cxpIsKCode, cxpKCodeMask, cxpHkpType, HKP type constants, and CXPOF_RX_ERR_HKP_BAD_K_CODE_C.
Extended CoaXPressOverFiberBridgeRx.vhd (line 48) with hkpKCodeMask, hkpKCodeValid, and hkpType.
Tightened HKP semantics: HKP now requires all-data nGMII control flags, validates each byte as a legal 8b/10b K-code value, classifies known CXP K-code words, and reports invalid K-code bytes separately from malformed control masks.
Propagated new ports through the bridge and GT wrappers.
Updated tests/docs to remove the “higher-level HKP command decoding” gap. HKP is now documented as High-Speed K-Code Payload validation/classification per the CXPoF spec reference
- Introduced CoaXPressOverFiberBridgeRxStatusWrapper and CoaXPressOverFiberBridgeStatusWrapper VHDL files to provide cocotb-facing status interfaces for the CoaXPressOverFiberBridge components.
- Updated CoaXPressOverFiberGthUsIpWrapper and CoaXPressOverFiberGtyUsIpWrapper to integrate the new bridge RX status signals.
- Implemented CoaXPressOverFiberBridgeAxiL Python class to expose bridge RX status via AXI-Lite interface, including sticky status bits, last observed sequence and HKP fields, and event counters.
- Added tests for the new AXI-Lite interface to validate status register functionality and behavior under various conditions.
- Updated README documentation to reflect the new status contract and register map.
Bug: The CTRL_ACK_S state in CoaXPressRxLane.vhd unconditionally expects 3 payload words before CRC. Write acknowledgments from the camera only have 2 words (ack_code + size=0, no data). After commit 10d5de3 added CRC/EOP validation, the state machine consumes the CRC as "data", then fails to find the real CRC → silently drops the response → CoaXPressConfig times out → SRP status 0x1.

Why reads work: Read responses include all 3 words (ack_code + size=4 + read_data) so the state machine parses them correctly.

Why ConnectionReset() works: It uses cmd.post() (fire-and-forget) which doesn't check the transaction response.

Fix (in CoaXPressRxLane.vhd:371-378): At ackCnt=1, check if rxData(31 downto 8) = 0 (DSize=0). If so, transition directly to CTRL_ACK_CRC_S without waiting for a data word that will never arrive.
Key changes:

CoaXPressRxLane now emits an in-order trailer verdict marker with SSI EOFE on malformed stream CRC/EOP.
CoaXPressRxWordPacker now has proper AXI-stream handshaking and preserves EOFE on packed tLast.
CoaXPressRxHsFsm now holds only the final packed EOF beat until the trailer verdict arrives, then applies EOFE before releasing it.
packOut and the named trailer/hold state are now RegType members in CoaXPressRxHsFsm.vhd (line 103). I left the remaining locals as short-lived arithmetic/loop temporaries.
Fixed duplicate error accounting: the lane reports malformed trailer errors, while the FSM uses the trailer verdict only to annotate EOFE.
The issue points at the stricter coaxpress-tests-2 control-ACK parser: VersionUsedCmd writes can get a camera ACK shape that either carries success in P0 only (0x00000001/0x00000004) or omits the explicit zero-size word before CRC/EOP. The current parser could treat that as nonzero status or fail to complete the ACK, which surfaces in Rogue as the hardware register-bus transaction error.

Changed:

CoaXPressRxLane.vhd (line 187): normalize repeated-byte and P0-only success ACK codes to zero status.

CoaXPressRxLane.vhd (line 392): accept write ACKs that go code + CRC + EOP without an explicit zero-size word.

test_CoaXPressRxLane.py (line 462): added regression coverage for both compatibility ACK shapes.
bengineerd added 15 commits May 23, 2026 01:31
- Introduced `RssiCoreWrapperMultiStreamIntegrationWrapper` to handle two application streams with interleaved traffic.
- Updated `handoff.md` and `progress.md` to reflect new multi-stream wrapper and its integration tests.
- Created `README.md` for the `rssi` directory detailing usage and configuration of `RssiCoreWrapper`.
- Added `test_RssiCoreWrapperMultiStream.py` to validate multi-stream functionality and connection status.
- Updated `README.md` in the protocols directory to link to the new `rssi` documentation.
…ransport interfaces and enhance multi-stream loss coverage in tests
- Added RSSI conformance pass for parameter range validation, BUSY cadence, cumulative ACK window release, max-retransmit RST/close behavior, and duplicate DATA suppression coverage.
- Updated `RssiAxiLiteRegItf` to clamp writable runtime parameters, ensuring illegal values are not accepted.
- Enhanced `RssiConnFsm` to reject invalid SYN/SYN+ACK parameters, ensuring only valid peer parameters are negotiated.
- Modified `RssiMonitor` to implement periodic local-BUSY ACK requests based on the RSSI page's recommended Retransmission Timeout/2.
- Introduced default coverage for cumulative ACK release of multiple TX segments and runtime register clamps.
- Added opt-in direct-core probes for integrated BUSY advertisement and strict no-extra-output checks after DATA retransmission recovery.
- Updated tests to validate new behaviors, including writable parameter clamps, rejection of out-of-range SYN parameters, and handling of duplicate DATA drops.
- Enhanced `RssiCore` register model to expose writable RSSI parameter ranges, preventing software-side verify mismatches.
- Improved test coverage for retransmission and recovery scenarios, ensuring compliance with updated protocol specifications.
…obes to default coverage; enhance application output backpressure handling in RssiCore and update related tests.
…tegrate new protocol functions across test files
…rences to reflect EACK as reserved/unsupported, and add standalone ACK+EACK rejection test coverage.
…terfaces in RssiCoreIntegrationWrapper, enhance cocotb loopback handling, and add checksum-disabled RX test coverage.
…urpose, DUT shape, stimulus, checks, and timing for improved readability and understanding.
@bengineerd bengineerd marked this pull request as draft May 28, 2026 17:54
- Updated README.md to reflect current status and testing coverage for RSSI regression suite, including details on recent expansions and test strategies.
- Expanded handoff.md with details on the 2026-05-28 follow-up, including new coverage areas and validation results.
- Added a new section in progress.md to document the 2026-05-28 test-suite expansion follow-up, detailing implemented regression items and validation results.
- Enhanced protocols/rssi/README.md to clarify test coverage and current EOFE behavior in the regression suite.
- Modified RssiCoreIntegrationWrapper.vhd to expose a flattened client AXI-Lite bus for improved integration testing.
- Updated test_RssiCore.py with new tests for multi-beat partial `TKEEP`, BUSY recovery, close/reopen lifecycle, AXI-Lite control path, checksum-disabled integration, and transport-output ready stalls.
- Added new tests in test_RssiCoreWrapper.py and test_RssiCoreWrapperMultiStream.py for partial-`TKEEP` coverage and EOFE preservation.
- Introduced utility function data_mask_from_keep in ssi_test_utils.py to facilitate valid byte preservation checks in tests.
@bengineerd bengineerd changed the title Add RSSI protocol regression coverage Add RSSI protocol regression coverage and RTL fixes May 28, 2026
… VHDL-93

Vivado synthesis defaults to VHDL-93 for files loaded without an explicit
fileType in ruckus.tcl. The conditional signal assignment introduced in
e6151de is only legal in VHDL-2008. Refactor to a sequential if/else so
the file remains synthesizable under the legacy LRM. Behavior unchanged.
ruck314 and others added 5 commits May 29, 2026 04:23
…AXIL

Map the build-time generics into the previously unused upper byte of the
existing RSSI AXI-Lite registers:

  0x0C[31:24] -> MAX_NUM_OUTS_SEG_G
  0x28[31:24] -> SEGMENT_ADDR_SIZE_G

Older firmware reads back 0 in those bits, so software can detect a
legacy bitfile and fall back to compile-time defaults instead of forcing
every application to thread per-instance generics through to the rogue
device tree. No new register space is consumed and the [7:0] / [23:16]
fields keep their original meaning, so existing software is unaffected.
…sters

Drop the maxNumOutsSeg and segmentAddrSize constructor arguments from
RssiCore. Applications routinely forgot to update them when changing the
firmware generics, leaving the rogue tree out of sync with what the FPGA
actually supported. Instead, read the build-time capability advertised
in 0x0C[31:24] and 0x28[31:24] from _start() once a link is reachable
and update the maximum hints on locMaxOutsSeg and locMaxSegSize in
place. Older firmware reads zero in those bits and the driver falls
back to the legacy defaults (8 outstanding segments, 1024-byte segment).

BREAKING CHANGE: callers passing maxNumOutsSeg= or segmentAddrSize= must
remove those keyword arguments; the values are now discovered at link
bring-up.
…eives and reducing fixed capture cycles in multi-stream tests
…enhancing multi-stream tests and adding conditional execution for extended cases
@ruck314 ruck314 force-pushed the batcher-tests branch 2 times, most recently from e8ae637 to f637a21 Compare June 14, 2026 22:56
Base automatically changed from batcher-tests to pre-release June 15, 2026 15:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants