feat(webrtc): introduce WebRtcListener to support multiple listen addresses#579
feat(webrtc): introduce WebRtcListener to support multiple listen addresses#579gab8i wants to merge 18 commits into
WebRtcListener to support multiple listen addresses#579Conversation
| return Err(Error::AddressError(AddressError::InvalidProtocol)); | ||
| } | ||
| } | ||
|
|
There was a problem hiding this comment.
Should we also check that the address has a certificate attached?
There was a problem hiding this comment.
No, this is a listen address, so a certhash isn't required.
I've added a comment noting that there is a check to ensure the address is a webrtc-direct multiaddr.
The DTLS certificate is generated once in WebRtcTransport::new() and presented to
dialing peers during the DTLS handshake, the certhash is the fingerprint of that certificate,
append when building the advertised multiaddr.
| target: LOG_TARGET, | ||
| ?local, | ||
| ?e, | ||
| "failed to receive a datagram", |
There was a problem hiding this comment.
nit: When a Poll::Ready(Err()) is returned the waker will not be registered. If this happens for all sockets we'll effectively be stuck in pending
| let listen_address = Self::get_socket_address(&listen_address)?; | ||
| let socket = if listen_address.is_ipv4() { | ||
| let socket = Socket::new(Domain::IPV4, Type::DGRAM, Some(socket2::Protocol::UDP))?; | ||
| socket.set_reuse_address(true)?; |
There was a problem hiding this comment.
set_reuse_address was called after the bind() previously, any reason for changing this? Should we deduplicate the listen addresses now that we suport multiple ones?
There was a problem hiding this comment.
set_reuse_address needs to be called before binding the socket, it sets SO_REUSEADDR flag. The same applies to set_reuse_port, which I just noticed is currently being called after the bind. This ordering is a Linux requirement.
The real question, though: do we actually want multiple WebRTC instances bound to the same address
and rely on the kernel's load balancer? I don't think we should allow two listeners on the same
(ip, port). So I'd keep set_reuse_address, multiple listeners can coexist on the same host, but drop set_reuse_port, ensuring each listener has a unique local address.
There was a problem hiding this comment.
We shouldn't set any of these. SO_REUSEADDR only makes sense when binding a TCP socket to a previously bound ip:port pair that was not cleanly shut down by previous process to not get EADDRINUSE. It doesn't make sense for UDP and was likely blindly copied from TCP/WS.
Agree that we shouldn't set SO_REUSEPORT either.
127d2f2 to
5db8bdc
Compare
ffe02f8 to
6d74870
Compare
5db8bdc to
4a18f93
Compare
…ddresses webrtc module has been slightly refactored with the introduction of `AddressPair` and the `WebRtcLisner` structs tu support multiple, or zero, listen addresses
6d74870 to
549e934
Compare
| if config.listen_addresses.is_empty() { | ||
| return Err(Error::Other( | ||
| "WebRTC transport requires at least one listen address but none were configured" | ||
| .to_string(), | ||
| )); | ||
| } |
There was a problem hiding this comment.
It should be possible to enable WebRTC transport and not set any WebRTC listen addresses. Can you double check this won't result in an error — i.e., if there are no WebRTC listen addresses, we don't instantiate WebRtcTransport at all?
There was a problem hiding this comment.
I've also talked with @lexnv about this. There is one main thing, which is the difference between the transport layers, TCP or WS can be created with 0 listen addresses because they still need to be able to dial another peer. WebRTC is different. Dialing in WebRTC is a no-op, thus it doesn't make sense to create a WebRTC instance without any listen addresses, not only because of the dialing, but also because it will be inserted in Substrate as 'experimental', and so if there is no listen address there shouldn't be any code path that touches WebRTC code, to avoid any problems.
This is to say that the same pattern of accepting 0 listen addresses cannot be used in WebRTC as it is in the other transports, and thus another approach should be taken. Something basic I can think of:
- Instead of exposing
webrtc::Configas public, we force the usage of a constructor that fails on empty listen_addresses. - We keep the failure within the WebRTC initialization and handle it gracefully within Substrate.
I think the first one is the easiest, also taking into consideration a future change, the implementation of the dialing side of WebRTC.
There was a problem hiding this comment.
Forcing a listen address different from 0.0.0.0 would guarantee that the sending UDP packets arrive on the same IP:port that it was received on.
For TCP this is not an issue, we'll bind to 0.0.0.0 then during the accept we ll receive a per connection socket. In UDP this becomes problematic on binding 0.0.0.0:
- peer A sent out a message on A-ip:port
- we provide the response, but since we listen on 0.0.0.0 the kernel can choose any interface to facilitate the transport, and might pick B-ip:port
- this will then cause issues with ICE / STUN:
The ICE agent MUST check that the source and destination transport
addresses in the Binding request and response are symmetric. That
is, the source IP address and port of the response MUST be equal to
the destination IP address and port to which the Binding request was
sent
https://datatracker.ietf.org/doc/html/rfc8445#section-7.2.5.2.1
I think we have a few options to make this work:
- https://man7.org/linux/man-pages/man2/IP_PKTINFO.2const.html
- If the address is unspecified, before binding the UDP socket translate it similar to TCP:
litep2p/src/transport/common/listener.rs
Lines 273 to 282 in 4cb002e
Probably the most straight forward one since we are still experimental would be to:
- return error if no addresses are provided
- return error/warn if an address tries to bind to 0.0.0.0
@dmitry-markin @gab8i would love to get your thoughts on this 🙏
There was a problem hiding this comment.
I've also talked with @lexnv about this. There is one main thing, which is the difference between the transport layers, TCP or WS can be created with 0 listen addresses because they still need to be able to dial another peer.
You are right, I was mostly thinking about the case when we enable the WebRTC feature, but don't pass any addresses. I guess in this case no WebRtcTransport should be initialized at all, so there is no issue.
There was a problem hiding this comment.
- return error/warn if an address tries to bind to 0.0.0.0
It should be possible to bind to 0.0.0.0, and it typically is not an issue, because the source IP address of the outgoing packet is selected according to the routing rules. If the machine is part of multiple networks, the interface will be selected correctly if routing is set up correctly. The only exception is if the machine has multiple public interfaces / interfaces connected to the same network, but this situation is so niche that the administrators are expected to understand the consequences and configure the networking accordingly.
There was a problem hiding this comment.
Wow, this adds a lot more information and constraints into the picture. Yes I think that given the current phase and the uncertainty of how the substrate integration will look like (given also the work that needs to be done to address persistent certificates #605) it would be easier to do what you described, keeping thus litep2p kind of clean and then just filter things on the higher layer within substrate!
| let fingerprint = crypto_provider.sha256_provider.sha256(&dtls_cert.certificate); | ||
|
|
||
| const MULTIHASH_SHA256_CODE: u64 = 0x12; | ||
| let certificate = Multihash::wrap(MULTIHASH_SHA256_CODE, &fingerprint) | ||
| .expect("fingerprint's len to be 32 bytes"); | ||
| const MULTIHASH_SHA256_CODE: u64 = 0x12; | ||
| let certificate = Multihash::wrap(MULTIHASH_SHA256_CODE, &fingerprint) | ||
| .expect("fingerprint's len to be 32 bytes"); |
There was a problem hiding this comment.
We can simplify this to:
let cert_hash = multihash_codetable::Code::Sha2_256::digest(&dtls_cert.certificate);
Also, certificate is strictly speaking either hash of fingerprint.
There was a problem hiding this comment.
oh, nice, yes this simplify it by a lot and remove the needs for that ugly constant!
I'm not sure though how this relates with the usage of str0m::crypto::from_feature_flags which uses the specified feature flag within cargo to construct the correct crypto provider which should be aligned with the multihash_codetable::Code used
There was a problem hiding this comment.
I think that something like the following is not as clean as what you proposed but definitely better than before (no constant used and panic handled properly):
let fingerprint = crypto_provider.sha256_provider.sha256(&dtls_cert.certificate);
let cert_hash = multihash_codetable::Code::Sha2_256.wrap(&fingerprint).map_err(|err| {
tracing::warn!(target: LOG_TARGET, ?err, "failed to wrap WebRTC certificate");
Error::Other("could not compute WebRTC certificate".to_string())
})?;But still re-using the same crypto provider
There was a problem hiding this comment.
There is no need to reuse a crypto provider for sha256, multihash_codetable uses Rust sha2 implementation.
| let nread = read_buf.filled().len(); | ||
| buf.truncate(nread); |
There was a problem hiding this comment.
We should avoid allocating/truncating a buffer on every loop iteration. Instead we should reuse the same buffer and pass a slice/cheap reference to on_socket_input. This should be enough, because DatagramRecv::try_from in on_socket_input accepts a borrowed slice anyway.
| match this.on_socket_input(addrs, buf) { | ||
| Ok(false) => {} | ||
| Ok(true) => loop { | ||
| match this.poll_connection(&addrs) { |
There was a problem hiding this comment.
I have a nagging feeling that regular/opening connection polling can be refactored. Currently, we handle opening connection polling as a special case by on_socket_input indicating we should manually poll it, and manually keeping track and polling the opening connections' timeouts. Instead, it should be possible to encapsulate this logic in the opening connection object, poll just it, and as a result of polling it can upgrade to a fully established connection. Definitely not for this PR, but worth considering as a follow up.
The name poll_connection also caused some confusion during the review, so may be let's at least rename it into poll_opening_connection, and also this.timeouts -> this.opening_timeouts. Actually, even a better name would be opening_timers, because these timeouts mostly indicate work to do, not that something has timed out (i.e., connection was unsuccessful).
There was a problem hiding this comment.
One more thing: if we put opening connection objects into FuturesUnordered, it will automatically skip polling objects that didn't generate wakeup events, contrary to polling all timeouts in a loop as we do now.
There was a problem hiding this comment.
What you described would make the implementation a lot cleaner and makes perfectly sense, I opened an issue to make sure this will be addressed in a follow-up
| /// Address of the local listening socket that received the datagram. | ||
| local: SocketAddr, | ||
| /// Address of the remote peer that sent the datagram. | ||
| source: SocketAddr, |
There was a problem hiding this comment.
dq: Could we replacethis by connection ID?
There was a problem hiding this comment.
I don't think this is possible, ConnectionID are used as keys within the connections map which has AddressPair part of the value.
Drop the hard-coded multihash code constant in favour of `multihash_codetable::Code`, and handle the wrap error instead of panicking.
| ConnectionEvent::ConnectionClosed => { | ||
| this.opening.remove(&source); | ||
| return None; | ||
| this.opening.remove(&addrs); |
There was a problem hiding this comment.
Should we also remove at this point all entries from connections that reference this addr?
this.connections.retain(a != addr)There was a problem hiding this comment.
Great, nice catch! connections clean up was also missing from another place!!
lexnv
left a comment
There was a problem hiding this comment.
LGTM!
There's an open question about tackling the listen address mode (#579 (comment)), we could either integrate the changes here of followu 🙏
webrtc module has been slightly refactored with the introduction of
AddressPairand theWebRtcLisnerstructs to support multiple, or zero, listen addresses