Skip to content

feat(webrtc): multistream-select protocol implementation#573

Merged
gab8i merged 12 commits into
masterfrom
gab_webrtc_multistream_select
May 20, 2026
Merged

feat(webrtc): multistream-select protocol implementation#573
gab8i merged 12 commits into
masterfrom
gab_webrtc_multistream_select

Conversation

@gab8i
Copy link
Copy Markdown
Contributor

@gab8i gab8i commented May 13, 2026

The following PR just wraps the work done within #554

The only reason for this PR and not the original one is a matter of simplicity: WebRTC work has been continued over pending PRs to reach a stable point, and now it is easier to split the final work into multiple PRs from the main branch where WebRTC has been e2e tested.

Now include also #576:
After the multistream-select header has been exchanged, only protocol and na messages should flow between peers. propose_next_fallback was incorrectly transitioning state back to WaitingResponse (expecting another header) and wrapping the outgoing protocol message with the header. Keep the state at WaitingProtocol and emit the protocol message without the header prefix.

Comment thread src/multistream_select/dialer_select.rs Outdated
Comment thread src/multistream_select/dialer_select.rs Outdated
Comment thread src/multistream_select/listener_select.rs Outdated
Comment thread src/multistream_select/protocol.rs Outdated
Comment thread src/transport/webrtc/connection.rs
is_connected = self.rtc.is_connected(),
"rejecting substream open: connection not healthy",
);
continue;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When will the protocol that asked for a new substream be informed that the connection is tearing down?

If rtc remains stuck in !is_alive or !is_connected would we be leaking the connection and stalling it indefinetely?

Copy link
Copy Markdown
Contributor Author

@gab8i gab8i May 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is a real issue: leaking connections! I already started working on, making sure that connection closes and errors are properly propagated without crashing anything is not straight forward. For sick of simplicity I will address this in a new PR on top of the stack to avoid having to deal with too many rebase conflicts!

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have created a followup for this: #582

Base automatically changed from gab_webrtc_dep_update to master May 14, 2026 07:46
@gab8i gab8i force-pushed the gab_webrtc_multistream_select branch from ab61686 to 2f518be Compare May 14, 2026 09:21
gab8i added 3 commits May 14, 2026 11:58
After the multistream-select header has been exchanged, only protocol
and na messages should flow between peers. propose_next_fallback
was incorrectly transitioning state back to WaitingResponse (expecting
another header) and wrapping the outgoing protocol message with the
header. Keep the state at WaitingProtocol and emit the protocol
message without the header prefix.

Also the receiving side has been fixed, no header is expected if it
was already received.
@gab8i gab8i force-pushed the gab_webrtc_multistream_select branch from 2f518be to bf1204e Compare May 14, 2026 09:59
Comment thread src/multistream_select/dialer_select.rs Outdated
Comment thread src/multistream_select/dialer_select.rs Outdated
Comment thread src/multistream_select/dialer_select.rs
Comment thread src/multistream_select/dialer_select.rs Outdated
}
(HandshakeState::WaitingProtocol, Some(protocol)) => {
(HandshakeState::WaitingProtocol, Ok(Message::Protocol(protocol))) => {
if protocol == PROTO_MULTISTREAM_1_0 {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could improve a bit the robustness of this function wrt the state machine management.

Currently, we accept only [HEADER ++ protocol].

If the buffer contains an extra message [HEADER ++ protocol ++ first_message/handshake] because smoldot/other impl decides to send us optimistically an extra packet to avoid excessive flushing, then we'd lose the trailing bytes.

We should have at least an warning to easily detect if the remaining bytes are non empty

Copy link
Copy Markdown
Contributor Author

@gab8i gab8i May 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Making this function more reliable would imply to change quite a lot the code, but what you describe could totally be implemented by a peer which can send those 3 things out at once. Right now it is only expected to interact with smoldot which seems to not do so.

I've pushed a change which warn if trailing bytes are found after having decoded valid multistream-select messages. Is this ok for now?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep warning should be sufficient for now since the fix is quite intrusive. Could you create an issue to not forget about it 🙏 ?

Comment thread src/multistream_select/protocol.rs Outdated
Comment thread src/transport/webrtc/connection.rs Outdated
Comment thread src/transport/webrtc/connection.rs Outdated
Comment thread src/multistream_select/listener_select.rs
Comment thread src/multistream_select/listener_select.rs Outdated
Comment thread src/multistream_select/dialer_select.rs Outdated
NegotiationError::Failed,
));
}
(_, Ok(Message::NotAvailable)) => {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be a protocol violation if the header was not received.

The spec says that for the first message the header should be echoed back:

> /multistream/1.0.0\n/ipfs/kad/1.0.0\n
< /multistream/1.0.0\nna\n

However, we currently support missing the header:

> /multistream/1.0.0\n/ipfs/kad/1.0.0\n
< na\n

Which will make us propose the next fallback protocol that will most definetely not be accepted since we are still in the HandshakeState::WaitingResponse state.

In case the our state is waiting protocol this is fine. However, if we are still waiting the header we should return a state missmatch / error

// This is not oki, we ahve not received the headr
(HandshakeState::WaitingResponse, Ok(Message::NotAvailable)) => return Ok(HandshakeResult::StateMismatch),
(HandshakeState::WaitingProtocol, Ok(Message::NotAvailable)) => return Ok(HandshakeResult::Rejected),

Lets also add a bit more trace logs / debugs here especially for the HandshakeResult::Failed cases

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch! This needs to be addressed to avoid any protocol issue! Solution should be pretty basic:

                (HandshakeState::WaitingResponse, Ok(msg)) => {
                   tracing::trace!(
                     target: LOG_TARGET,
                     ?msg,
                     "Expected header response from peer, got different message"
                   );
                   return Err(crate::error::NegotiationError::MultistreamSelectError(
                       NegotiationError::Failed,
                   ));
               }

Comment thread src/multistream_select/dialer_select.rs
Comment thread src/multistream_select/dialer_select.rs Outdated
@lexnv
Copy link
Copy Markdown
Collaborator

lexnv commented May 14, 2026

We are getting closer here! Thanks again @gab8i for tackling this! 🙏 Will approve the PR as soon as we tackle #573 (comment) and above nits to improve debuggability

Have created two followups not to bloat the scope of this PR:

Have added the issues to our webrtc milestone as well for better visibility:

Comment thread src/multistream_select/dialer_select.rs Outdated
Copy link
Copy Markdown
Collaborator

@lexnv lexnv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice job here! 🚀

… handling

Only expect the header as first response of the multistream-select
protocol.

Warn if the remote peer has pipelined something after the protocol.
Comment thread src/multistream_select/dialer_select.rs Outdated
return Ok(HandshakeResult::Succeeded(self.protocol.clone()));
}

for fallback in &self.fallback_names {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it correct to accept fallback protocols here? If we offered only self.protocol but a peer would answer with one of our fallback protocols, we would accept. But indeed we should first offer the fallback right?

test that fails:

    #[test]
    fn reject_unproposed_fallback_confirmation() {
        let (mut dialer_state, _message) = WebRtcDialerState::propose(
            ProtocolName::from("/13371338/proto/1"),
            vec![ProtocolName::from("/sup/proto/1")],
        )
        .unwrap();

        // The dialer has only proposed the main protocol. The fallback is stored for a
        // later round and must not be accepted until `propose_next_fallback()` sends it.
        let mut response = BytesMut::with_capacity(64);
        response.put_u8(MSG_MULTISTREAM_1_0.len() as u8);
        Message::Header(HeaderLine::V1).encode(&mut response).unwrap();

        let fallback = Protocol::try_from(&b"/sup/proto/1"[..]).expect("valid protocol name");
        response.put_u8((fallback.as_ref().len() + 1) as u8);
        Message::Protocol(fallback).encode(&mut response).unwrap();

        match dialer_state.register_response(response.freeze().to_vec()) {
            Err(error::NegotiationError::MultistreamSelectError(NegotiationError::Failed)) => {}
            event => panic!("expected unproposed fallback to be rejected, got: {event:?}"),
        }
    }

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are totally right! The fix is to remember which one is the last proposed one and only succeed if the response contains it!

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After looking at the code, it is just a matter of removing fallback_protocols from being used to accept or not the registered response!

);

self.pending_outbound.remove(&channel_id);
self.channels.remove(&channel_id);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it fine to just remove here? If the channel state is ChannelState::OutboundOpening, we would need to call report_substream_open_failure.

Copy link
Copy Markdown
Contributor Author

@gab8i gab8i May 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! This was already addressed within this commit, part of #586!

@gab8i gab8i force-pushed the gab_webrtc_multistream_select branch from 77901f4 to f1a3be5 Compare May 18, 2026 16:56
Comment on lines +383 to +384
)?
.freeze()
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: This and above ? operator alters the state of fallback_names which we have just popped. I think this is ok since we terminate the substream immediately and return negotiation-error?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it works because if propose_next_fallback fails then also on_outbound_opening_channel_data fails and the data channel is getting closed!

);

self.channels.insert(channel_id, ChannelState::InboundOpening);
self.channels.insert(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could get into a situation where the remote (smoldot) contains a bug and never sends more data after the header in the negotiation phase.

We could probably add a SUBSTREAM_OPEN_TIMEOUT similar to TCP to ensure that the negotiation phase is bounded and cannot leak channels until the connection is closed. In this case if we reach the timeout, we could simply propagate back the substream failed to installed protocols

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's definitely a bug where both inbound and outbound opened channels can hang forever waiting for the protocol handshake. I will open an issue to address this in a follow up PR!

Copy link
Copy Markdown

@skunert skunert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good

@gab8i gab8i merged commit 734e9e8 into master May 20, 2026
10 checks passed
@gab8i gab8i deleted the gab_webrtc_multistream_select branch May 20, 2026 09:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

webrtc: multistream-select implementation used by WebRTC doesn't conform to the spec

4 participants