Fix DCEP panics and SCTP stream ID reuse race condition#883
Conversation
Replace .expect() on DCEP open/ack writes with match blocks that log a warning and close the stream on failure, preventing panics from ErrPayloadDataStateNotExist.
|
@timwu20 did you do a deeper investigation on this? Or is this more a shooting from the hip for something you observe? Can we prove the problem and the fix with fuzz testing? |
@algesten Yes, this came from a real investigation while testing with a Polkadot node, not shooting from the hip. Here's the full context: BackgroundWe're using str0m in litep2p to implement libp2p protocols over WebRTC. In the libp2p world, request/response protocols routinely open a data channel, send a request, receive a response, and immediately close the stream. This pattern of rapidly closing and reopening data channels is what triggers the stream ID reuse problem. What I observedOn the browser side, I could see DCEP opens being rejected — the remote peer was receiving a DCEP open for a stream ID that it hadn't finished processing the RE-CONFIG close for yet. This led to Investigation into sctp-protoI looked into whether we could poll sctp-proto to know when a stream's RE-CONFIG has been fully acknowledged, so we'd know exactly when it's safe to reuse that stream ID. However, that would require:
Since both of those are more invasive changes to sctp-proto's API, the cooldown approach in str0m is a naive but pragmatic fix that solves the problem at the str0m level. The 2-second cooldown is conservative enough to cover the RE-CONFIG round-trip in practice.
I'm open to adding fuzz testing for this. The scenario to fuzz would be rapid open/close/reopen cycles of data channels — that's what reliably triggers the issue in our libp2p usage. The unit test |
|
Thanks! |
Summary
write_with_ppierrors gracefully instead of panicking — Replace.expect()calls on DCEP open/ack writes with match blocks that log a warning and close the stream on failure, preventing panics fromErrPayloadDataStateNotExistwhen the SCTP stream is in an unexpected state.ErrStreamAlreadyExisterrors.Problem
When data channels are rapidly closed and reopened, two issues can occur:
Panic on DCEP write: If the SCTP stream enters an unexpected state (e.g., due to a concurrent close),
write_with_ppireturns an error that was previously unwrapped with.expect(), causing a panic.Stream ID reuse race: When a data channel is closed, str0m sends an SCTP RE-CONFIG to reset the stream. However, the stream ID was immediately available for reuse. If a new channel was allocated the same stream ID before the remote peer processed the RE-CONFIG,
sctp-protowould returnErrStreamAlreadyExist, leading to the error in (1).Solution
bfbce6f): Replaces panicking.expect()calls with proper error handling that logs the failure and cleanly closes the stream.1a762ec): Introduces aclosed_stream_idscooldown list inChannelHandler. Closed stream IDs are excluded from allocation forSTREAM_ID_COOLDOWN(2 seconds), giving the remote peer time to process the RE-CONFIG before the ID is reused.Test plan
stream_id_not_reused_during_cooldownverifies cooldown behaviorcargo testpasses (all exceptdata_channel_floodwhich is unrelated)cargo fmtandcargo clippyclean