Skip to content

feat(webrtc): introduce WebRtcListener to support multiple listen addresses#579

Open
gab8i wants to merge 18 commits into
masterfrom
gab_webrtc_listener
Open

feat(webrtc): introduce WebRtcListener to support multiple listen addresses#579
gab8i wants to merge 18 commits into
masterfrom
gab_webrtc_listener

Conversation

@gab8i
Copy link
Copy Markdown
Contributor

@gab8i gab8i commented May 13, 2026

webrtc module has been slightly refactored with the introduction of AddressPair and the WebRtcLisner structs to support multiple, or zero, listen addresses

Comment thread src/transport/webrtc/listener.rs Outdated
Comment thread src/transport/webrtc/listener.rs
return Err(Error::AddressError(AddressError::InvalidProtocol));
}
}

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also check that the address has a certificate attached?

Copy link
Copy Markdown
Contributor Author

@gab8i gab8i May 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this is a listen address, so a certhash isn't required.
I've added a comment noting that there is a check to ensure the address is a webrtc-direct multiaddr.

The DTLS certificate is generated once in WebRtcTransport::new() and presented to
dialing peers during the DTLS handshake, the certhash is the fingerprint of that certificate,
append when building the advertised multiaddr.

target: LOG_TARGET,
?local,
?e,
"failed to receive a datagram",
Copy link
Copy Markdown
Collaborator

@lexnv lexnv May 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: When a Poll::Ready(Err()) is returned the waker will not be registered. If this happens for all sockets we'll effectively be stuck in pending

Comment thread src/transport/webrtc/listener.rs Outdated
let listen_address = Self::get_socket_address(&listen_address)?;
let socket = if listen_address.is_ipv4() {
let socket = Socket::new(Domain::IPV4, Type::DGRAM, Some(socket2::Protocol::UDP))?;
socket.set_reuse_address(true)?;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

set_reuse_address was called after the bind() previously, any reason for changing this? Should we deduplicate the listen addresses now that we suport multiple ones?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

set_reuse_address needs to be called before binding the socket, it sets SO_REUSEADDR flag. The same applies to set_reuse_port, which I just noticed is currently being called after the bind. This ordering is a Linux requirement.

The real question, though: do we actually want multiple WebRTC instances bound to the same address
and rely on the kernel's load balancer? I don't think we should allow two listeners on the same
(ip, port). So I'd keep set_reuse_address, multiple listeners can coexist on the same host, but drop set_reuse_port, ensuring each listener has a unique local address.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't set any of these. SO_REUSEADDR only makes sense when binding a TCP socket to a previously bound ip:port pair that was not cleanly shut down by previous process to not get EADDRINUSE. It doesn't make sense for UDP and was likely blindly copied from TCP/WS.

Agree that we shouldn't set SO_REUSEPORT either.

@gab8i gab8i force-pushed the gab_webrtc_multiaddr_bump branch from 127d2f2 to 5db8bdc Compare May 18, 2026 14:16
@gab8i gab8i force-pushed the gab_webrtc_listener branch from ffe02f8 to 6d74870 Compare May 18, 2026 14:17
@gab8i gab8i force-pushed the gab_webrtc_multiaddr_bump branch from 5db8bdc to 4a18f93 Compare May 20, 2026 10:41
…ddresses

webrtc module has been slightly refactored with the introduction
of `AddressPair` and the `WebRtcLisner` structs tu support
multiple, or zero, listen addresses
@gab8i gab8i force-pushed the gab_webrtc_listener branch from 6d74870 to 549e934 Compare May 21, 2026 11:32
@gab8i gab8i changed the base branch from gab_webrtc_multiaddr_bump to master May 21, 2026 11:32
Comment on lines +410 to +415
if config.listen_addresses.is_empty() {
return Err(Error::Other(
"WebRTC transport requires at least one listen address but none were configured"
.to_string(),
));
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be possible to enable WebRTC transport and not set any WebRTC listen addresses. Can you double check this won't result in an error — i.e., if there are no WebRTC listen addresses, we don't instantiate WebRtcTransport at all?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've also talked with @lexnv about this. There is one main thing, which is the difference between the transport layers, TCP or WS can be created with 0 listen addresses because they still need to be able to dial another peer. WebRTC is different. Dialing in WebRTC is a no-op, thus it doesn't make sense to create a WebRTC instance without any listen addresses, not only because of the dialing, but also because it will be inserted in Substrate as 'experimental', and so if there is no listen address there shouldn't be any code path that touches WebRTC code, to avoid any problems.

This is to say that the same pattern of accepting 0 listen addresses cannot be used in WebRTC as it is in the other transports, and thus another approach should be taken. Something basic I can think of:

  1. Instead of exposing webrtc::Config as public, we force the usage of a constructor that fails on empty listen_addresses.
  2. We keep the failure within the WebRTC initialization and handle it gracefully within Substrate.

I think the first one is the easiest, also taking into consideration a future change, the implementation of the dialing side of WebRTC.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forcing a listen address different from 0.0.0.0 would guarantee that the sending UDP packets arrive on the same IP:port that it was received on.

For TCP this is not an issue, we'll bind to 0.0.0.0 then during the accept we ll receive a per connection socket. In UDP this becomes problematic on binding 0.0.0.0:

  • peer A sent out a message on A-ip:port
  • we provide the response, but since we listen on 0.0.0.0 the kernel can choose any interface to facilitate the transport, and might pick B-ip:port
  • this will then cause issues with ICE / STUN:

The ICE agent MUST check that the source and destination transport
addresses in the Binding request and response are symmetric. That
is, the source IP address and port of the response MUST be equal to
the destination IP address and port to which the Binding request was
sent

https://datatracker.ietf.org/doc/html/rfc8445#section-7.2.5.2.1

I think we have a few options to make this work:

match NetworkInterface::show() {
Ok(ifaces) => ifaces
.into_iter()
.flat_map(|record| {
record.addr.into_iter().filter_map(|iface_address| {
match (iface_address, address.is_ipv4()) {
(Addr::V4(inner), true) => Some(SocketAddr::new(
IpAddr::V4(inner.ip),
local_address.port(),
)),

Probably the most straight forward one since we are still experimental would be to:

  • return error if no addresses are provided
  • return error/warn if an address tries to bind to 0.0.0.0

@dmitry-markin @gab8i would love to get your thoughts on this 🙏

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've also talked with @lexnv about this. There is one main thing, which is the difference between the transport layers, TCP or WS can be created with 0 listen addresses because they still need to be able to dial another peer.

You are right, I was mostly thinking about the case when we enable the WebRTC feature, but don't pass any addresses. I guess in this case no WebRtcTransport should be initialized at all, so there is no issue.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • return error/warn if an address tries to bind to 0.0.0.0

It should be possible to bind to 0.0.0.0, and it typically is not an issue, because the source IP address of the outgoing packet is selected according to the routing rules. If the machine is part of multiple networks, the interface will be selected correctly if routing is set up correctly. The only exception is if the machine has multiple public interfaces / interfaces connected to the same network, but this situation is so niche that the administrators are expected to understand the consequences and configure the networking accordingly.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, this adds a lot more information and constraints into the picture. Yes I think that given the current phase and the uncertainty of how the substrate integration will look like (given also the work that needs to be done to address persistent certificates #605) it would be easier to do what you described, keeping thus litep2p kind of clean and then just filter things on the higher layer within substrate!

Comment thread src/transport/webrtc/listener.rs Outdated
Comment thread src/transport/webrtc/listener.rs
Comment thread src/transport/webrtc/mod.rs Outdated
Comment thread src/transport/webrtc/mod.rs Outdated
Comment on lines +431 to +435
let fingerprint = crypto_provider.sha256_provider.sha256(&dtls_cert.certificate);

const MULTIHASH_SHA256_CODE: u64 = 0x12;
let certificate = Multihash::wrap(MULTIHASH_SHA256_CODE, &fingerprint)
.expect("fingerprint's len to be 32 bytes");
const MULTIHASH_SHA256_CODE: u64 = 0x12;
let certificate = Multihash::wrap(MULTIHASH_SHA256_CODE, &fingerprint)
.expect("fingerprint's len to be 32 bytes");
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can simplify this to:

let cert_hash = multihash_codetable::Code::Sha2_256::digest(&dtls_cert.certificate);

Also, certificate is strictly speaking either hash of fingerprint.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, nice, yes this simplify it by a lot and remove the needs for that ugly constant!

I'm not sure though how this relates with the usage of str0m::crypto::from_feature_flags which uses the specified feature flag within cargo to construct the correct crypto provider which should be aligned with the multihash_codetable::Code used

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that something like the following is not as clean as what you proposed but definitely better than before (no constant used and panic handled properly):

        let fingerprint = crypto_provider.sha256_provider.sha256(&dtls_cert.certificate);
        let cert_hash = multihash_codetable::Code::Sha2_256.wrap(&fingerprint).map_err(|err| {
            tracing::warn!(target: LOG_TARGET, ?err, "failed to wrap WebRTC certificate");
            Error::Other("could not compute WebRTC certificate".to_string())
        })?;

But still re-using the same crypto provider

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no need to reuse a crypto provider for sha256, multihash_codetable uses Rust sha2 implementation.

Comment thread src/transport/webrtc/mod.rs Outdated
Comment on lines +642 to +643
let nread = read_buf.filled().len();
buf.truncate(nread);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should avoid allocating/truncating a buffer on every loop iteration. Instead we should reuse the same buffer and pass a slice/cheap reference to on_socket_input. This should be enough, because DatagramRecv::try_from in on_socket_input accepts a borrowed slice anyway.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is already been noticed by @lexnv and there is a live issue #588. As you said the solution is pretty easy, just a matter of sharing a single buffer within the WebRtcTransport struct, I already have a branch with the fix, I will create a pr soon!

match this.on_socket_input(addrs, buf) {
Ok(false) => {}
Ok(true) => loop {
match this.poll_connection(&addrs) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a nagging feeling that regular/opening connection polling can be refactored. Currently, we handle opening connection polling as a special case by on_socket_input indicating we should manually poll it, and manually keeping track and polling the opening connections' timeouts. Instead, it should be possible to encapsulate this logic in the opening connection object, poll just it, and as a result of polling it can upgrade to a fully established connection. Definitely not for this PR, but worth considering as a follow up.

The name poll_connection also caused some confusion during the review, so may be let's at least rename it into poll_opening_connection, and also this.timeouts -> this.opening_timeouts. Actually, even a better name would be opening_timers, because these timeouts mostly indicate work to do, not that something has timed out (i.e., connection was unsuccessful).

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more thing: if we put opening connection objects into FuturesUnordered, it will automatically skip polling objects that didn't generate wakeup events, contrary to polling all timeouts in a loop as we do now.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What you described would make the implementation a lot cleaner and makes perfectly sense, I opened an issue to make sure this will be addressed in a follow-up

#609

/// Address of the local listening socket that received the datagram.
local: SocketAddr,
/// Address of the remote peer that sent the datagram.
source: SocketAddr,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dq: Could we replacethis by connection ID?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is possible, ConnectionID are used as keys within the connections map which has AddressPair part of the value.

Comment thread src/transport/webrtc/listener.rs Outdated
Drop the hard-coded multihash code constant in favour of
`multihash_codetable::Code`, and handle the wrap error instead of
panicking.
Comment thread src/transport/webrtc/mod.rs Outdated
ConnectionEvent::ConnectionClosed => {
this.opening.remove(&source);
return None;
this.opening.remove(&addrs);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also remove at this point all entries from connections that reference this addr?

this.connections.retain(a != addr)

Copy link
Copy Markdown
Contributor Author

@gab8i gab8i Jun 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, nice catch! connections clean up was also missing from another place!!

Copy link
Copy Markdown
Collaborator

@lexnv lexnv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

There's an open question about tackling the listen address mode (#579 (comment)), we could either integrate the changes here of followu 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants