feat(webrtc): multistream-select protocol implementation#573
Conversation
| is_connected = self.rtc.is_connected(), | ||
| "rejecting substream open: connection not healthy", | ||
| ); | ||
| continue; |
There was a problem hiding this comment.
When will the protocol that asked for a new substream be informed that the connection is tearing down?
If rtc remains stuck in !is_alive or !is_connected would we be leaking the connection and stalling it indefinetely?
There was a problem hiding this comment.
Yes, this is a real issue: leaking connections! I already started working on, making sure that connection closes and errors are properly propagated without crashing anything is not straight forward. For sick of simplicity I will address this in a new PR on top of the stack to avoid having to deal with too many rebase conflicts!
ab61686 to
2f518be
Compare
After the multistream-select header has been exchanged, only protocol and na messages should flow between peers. propose_next_fallback was incorrectly transitioning state back to WaitingResponse (expecting another header) and wrapping the outgoing protocol message with the header. Keep the state at WaitingProtocol and emit the protocol message without the header prefix. Also the receiving side has been fixed, no header is expected if it was already received.
2f518be to
bf1204e
Compare
| } | ||
| (HandshakeState::WaitingProtocol, Some(protocol)) => { | ||
| (HandshakeState::WaitingProtocol, Ok(Message::Protocol(protocol))) => { | ||
| if protocol == PROTO_MULTISTREAM_1_0 { |
There was a problem hiding this comment.
We could improve a bit the robustness of this function wrt the state machine management.
Currently, we accept only [HEADER ++ protocol].
If the buffer contains an extra message [HEADER ++ protocol ++ first_message/handshake] because smoldot/other impl decides to send us optimistically an extra packet to avoid excessive flushing, then we'd lose the trailing bytes.
We should have at least an warning to easily detect if the remaining bytes are non empty
There was a problem hiding this comment.
Making this function more reliable would imply to change quite a lot the code, but what you describe could totally be implemented by a peer which can send those 3 things out at once. Right now it is only expected to interact with smoldot which seems to not do so.
I've pushed a change which warn if trailing bytes are found after having decoded valid multistream-select messages. Is this ok for now?
There was a problem hiding this comment.
Yep warning should be sufficient for now since the fix is quite intrusive. Could you create an issue to not forget about it 🙏 ?
| NegotiationError::Failed, | ||
| )); | ||
| } | ||
| (_, Ok(Message::NotAvailable)) => { |
There was a problem hiding this comment.
This could be a protocol violation if the header was not received.
The spec says that for the first message the header should be echoed back:
> /multistream/1.0.0\n/ipfs/kad/1.0.0\n
< /multistream/1.0.0\nna\n
However, we currently support missing the header:
> /multistream/1.0.0\n/ipfs/kad/1.0.0\n
< na\n
Which will make us propose the next fallback protocol that will most definetely not be accepted since we are still in the HandshakeState::WaitingResponse state.
In case the our state is waiting protocol this is fine. However, if we are still waiting the header we should return a state missmatch / error
// This is not oki, we ahve not received the headr
(HandshakeState::WaitingResponse, Ok(Message::NotAvailable)) => return Ok(HandshakeResult::StateMismatch),
(HandshakeState::WaitingProtocol, Ok(Message::NotAvailable)) => return Ok(HandshakeResult::Rejected),Lets also add a bit more trace logs / debugs here especially for the HandshakeResult::Failed cases
There was a problem hiding this comment.
Nice catch! This needs to be addressed to avoid any protocol issue! Solution should be pretty basic:
(HandshakeState::WaitingResponse, Ok(msg)) => {
tracing::trace!(
target: LOG_TARGET,
?msg,
"Expected header response from peer, got different message"
);
return Err(crate::error::NegotiationError::MultistreamSelectError(
NegotiationError::Failed,
));
}|
We are getting closer here! Thanks again @gab8i for tackling this! 🙏 Will approve the PR as soon as we tackle #573 (comment) and above nits to improve debuggability Have created two followups not to bloat the scope of this PR:
Have added the issues to our webrtc milestone as well for better visibility: |
… handling Only expect the header as first response of the multistream-select protocol. Warn if the remote peer has pipelined something after the protocol.
01c5e59 to
c6bdbae
Compare
| return Ok(HandshakeResult::Succeeded(self.protocol.clone())); | ||
| } | ||
|
|
||
| for fallback in &self.fallback_names { |
There was a problem hiding this comment.
Is it correct to accept fallback protocols here? If we offered only self.protocol but a peer would answer with one of our fallback protocols, we would accept. But indeed we should first offer the fallback right?
test that fails:
#[test]
fn reject_unproposed_fallback_confirmation() {
let (mut dialer_state, _message) = WebRtcDialerState::propose(
ProtocolName::from("/13371338/proto/1"),
vec![ProtocolName::from("/sup/proto/1")],
)
.unwrap();
// The dialer has only proposed the main protocol. The fallback is stored for a
// later round and must not be accepted until `propose_next_fallback()` sends it.
let mut response = BytesMut::with_capacity(64);
response.put_u8(MSG_MULTISTREAM_1_0.len() as u8);
Message::Header(HeaderLine::V1).encode(&mut response).unwrap();
let fallback = Protocol::try_from(&b"/sup/proto/1"[..]).expect("valid protocol name");
response.put_u8((fallback.as_ref().len() + 1) as u8);
Message::Protocol(fallback).encode(&mut response).unwrap();
match dialer_state.register_response(response.freeze().to_vec()) {
Err(error::NegotiationError::MultistreamSelectError(NegotiationError::Failed)) => {}
event => panic!("expected unproposed fallback to be rejected, got: {event:?}"),
}
}There was a problem hiding this comment.
You are totally right! The fix is to remember which one is the last proposed one and only succeed if the response contains it!
There was a problem hiding this comment.
After looking at the code, it is just a matter of removing fallback_protocols from being used to accept or not the registered response!
| ); | ||
|
|
||
| self.pending_outbound.remove(&channel_id); | ||
| self.channels.remove(&channel_id); |
There was a problem hiding this comment.
Is it fine to just remove here? If the channel state is ChannelState::OutboundOpening, we would need to call report_substream_open_failure.
77901f4 to
f1a3be5
Compare
| )? | ||
| .freeze() |
There was a problem hiding this comment.
nit: This and above ? operator alters the state of fallback_names which we have just popped. I think this is ok since we terminate the substream immediately and return negotiation-error?
There was a problem hiding this comment.
Yes, it works because if propose_next_fallback fails then also on_outbound_opening_channel_data fails and the data channel is getting closed!
| ); | ||
|
|
||
| self.channels.insert(channel_id, ChannelState::InboundOpening); | ||
| self.channels.insert( |
There was a problem hiding this comment.
We could get into a situation where the remote (smoldot) contains a bug and never sends more data after the header in the negotiation phase.
We could probably add a SUBSTREAM_OPEN_TIMEOUT similar to TCP to ensure that the negotiation phase is bounded and cannot leak channels until the connection is closed. In this case if we reach the timeout, we could simply propagate back the substream failed to installed protocols
There was a problem hiding this comment.
Yes, that's definitely a bug where both inbound and outbound opened channels can hang forever waiting for the protocol handshake. I will open an issue to address this in a follow up PR!
The following PR just wraps the work done within #554
The only reason for this PR and not the original one is a matter of simplicity: WebRTC work has been continued over pending PRs to reach a stable point, and now it is easier to split the final work into multiple PRs from the main branch where WebRTC has been e2e tested.
Now include also #576:
After the multistream-select header has been exchanged, only protocol and na messages should flow between peers. propose_next_fallback was incorrectly transitioning state back to WaitingResponse (expecting another header) and wrapping the outgoing protocol message with the header. Keep the state at WaitingProtocol and emit the protocol message without the header prefix.