Fix DCEP panics and SCTP stream ID reuse race condition by timwu20 · Pull Request #883 · algesten/str0m

timwu20 · 2026-03-04T16:02:55Z

Summary

Handle write_with_ppi errors gracefully instead of panicking — Replace .expect() calls on DCEP open/ack writes with match blocks that log a warning and close the stream on failure, preventing panics from ErrPayloadDataStateNotExist when the SCTP stream is in an unexpected state.
Prevent SCTP stream ID reuse before RE-CONFIG completion — Add a cooldown period (2s) for recently closed stream IDs so they are not immediately reallocated. This avoids a race condition where a new data channel is assigned a stream ID that is still pending RE-CONFIG acknowledgment from the remote peer, which causes ErrStreamAlreadyExist errors.

Problem

When data channels are rapidly closed and reopened, two issues can occur:

Panic on DCEP write: If the SCTP stream enters an unexpected state (e.g., due to a concurrent close), write_with_ppi returns an error that was previously unwrapped with .expect(), causing a panic.
Stream ID reuse race: When a data channel is closed, str0m sends an SCTP RE-CONFIG to reset the stream. However, the stream ID was immediately available for reuse. If a new channel was allocated the same stream ID before the remote peer processed the RE-CONFIG, sctp-proto would return ErrStreamAlreadyExist, leading to the error in (1).

Solution

Commit 1 (bfbce6f): Replaces panicking .expect() calls with proper error handling that logs the failure and cleanly closes the stream.
Commit 2 (1a762ec): Introduces a closed_stream_ids cooldown list in ChannelHandler. Closed stream IDs are excluded from allocation for STREAM_ID_COOLDOWN (2 seconds), giving the remote peer time to process the RE-CONFIG before the ID is reused.

Test plan

Existing unit tests updated and passing
New test stream_id_not_reused_during_cooldown verifies cooldown behavior
cargo test passes (all except data_channel_flood which is unrelated)
cargo fmt and cargo clippy clean

Replace .expect() on DCEP open/ack writes with match blocks that log a warning and close the stream on failure, preventing panics from ErrPayloadDataStateNotExist.

algesten · 2026-03-04T21:23:34Z

@timwu20 did you do a deeper investigation on this? Or is this more a shooting from the hip for something you observe?

Can we prove the problem and the fix with fuzz testing?

timwu20 · 2026-03-05T01:41:01Z

@timwu20 did you do a deeper investigation on this? Or is this more a shooting from the hip for something you observe?

@algesten Yes, this came from a real investigation while testing with a Polkadot node, not shooting from the hip. Here's the full context:

Background

We're using str0m in litep2p to implement libp2p protocols over WebRTC. In the libp2p world, request/response protocols routinely open a data channel, send a request, receive a response, and immediately close the stream. This pattern of rapidly closing and reopening data channels is what triggers the stream ID reuse problem.

What I observed

On the browser side, I could see DCEP opens being rejected — the remote peer was receiving a DCEP open for a stream ID that it hadn't finished processing the RE-CONFIG close for yet. This led to ErrStreamAlreadyExist in sctp-proto, which then caused the panic via .expect() on the subsequent write_with_ppi call.

Investigation into sctp-proto

I looked into whether we could poll sctp-proto to know when a stream's RE-CONFIG has been fully acknowledged, so we'd know exactly when it's safe to reuse that stream ID. However, that would require:

Changing the public API of sctp-proto to expose when streams have been internally removed (i.e., when the RE-CONFIG acknowledgment has been received from the remote peer).
Implementing a cooldown within sctp-proto itself, because even after we send the RE-CONFIG, we need to give the remote peer time to receive and process our RE-CONFIG ack before we can reuse the stream ID.

Since both of those are more invasive changes to sctp-proto's API, the cooldown approach in str0m is a naive but pragmatic fix that solves the problem at the str0m level. The 2-second cooldown is conservative enough to cover the RE-CONFIG round-trip in practice.

Can we prove the problem and the fix with fuzz testing?

I'm open to adding fuzz testing for this. The scenario to fuzz would be rapid open/close/reopen cycles of data channels — that's what reliably triggers the issue in our libp2p usage. The unit test stream_id_not_reused_during_cooldown validates the cooldown mechanism itself, but a fuzz test exercising the full SCTP lifecycle could add more confidence.

algesten · 2026-03-07T14:00:50Z

Thanks!

timwu20 added 2 commits March 3, 2026 11:16

Handle write_with_ppi errors gracefully instead of panicking

bfbce6f

Replace .expect() on DCEP open/ack writes with match blocks that log a warning and close the stream on failure, preventing panics from ErrPayloadDataStateNotExist.

Prevent SCTP stream ID reuse before RE-CONFIG completion

1a762ec

algesten approved these changes Mar 7, 2026

View reviewed changes

algesten merged commit 3ddc3fc into algesten:main Mar 7, 2026
55 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix DCEP panics and SCTP stream ID reuse race condition#883

Fix DCEP panics and SCTP stream ID reuse race condition#883
algesten merged 2 commits into
algesten:mainfrom
ChainSafe:tim/fix-dcep-panics-stream-reuse

timwu20 commented Mar 4, 2026

Uh oh!

algesten commented Mar 4, 2026

Uh oh!

timwu20 commented Mar 5, 2026

Uh oh!

Uh oh!

algesten commented Mar 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

timwu20 commented Mar 4, 2026

Summary

Problem

Solution

Test plan

Uh oh!

algesten commented Mar 4, 2026

Uh oh!

timwu20 commented Mar 5, 2026

Background

What I observed

Investigation into sctp-proto

Uh oh!

Uh oh!

algesten commented Mar 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants