Skip to content

SpecMsg: Design Parachain Communication#12226

Open
lexnv wants to merge 1 commit into
rk-design-speculative-messagingfrom
lexnv/spec-msg-disc
Open

SpecMsg: Design Parachain Communication#12226
lexnv wants to merge 1 commit into
rk-design-speculative-messagingfrom
lexnv/spec-msg-disc

Conversation

@lexnv
Copy link
Copy Markdown
Contributor

@lexnv lexnv commented May 28, 2026

This PR extends the design of speculative messaging to clarify parachain communication.

In summary:

  • create a new Speculative Messaging Network for all collators of all parachains
  • extend bootnodes on DHT feature to include entry bootnodes into this network

Inside the SpecMsg network:

  • collators register as providers under their ParaID in DHT: find 20 collators for para ID
  • collators publish authority records in DHT (full discoverability using authority discovery keys)
  • Parachain A gets the full list of authority discovery keys by making a light-client request to those 20 collators closest to the para ID it wants to communicate with. Using the proof generated and state root from relay chain, parachain A gets the full list of auth keys and can find all collators

Markdown Rendering:

Parachain Communication

Parachain collators operating on different peer-to-peer (P2P) networks need a way to exchange messages off-chain.
The relay chain only processes message commitments, not the messages themselves. Direct communication between
collators of different parachains is not possible due to different genesis hashes and sync protocols.

To enable off-chain communicaiton between collators, a dedicated P2P network is created.
This Speculative Messaging Network includes collators from al parachains that opt into
speculative messaging.

Alternative architectures were considered:

  • Routing through relay chain peers: Adds unnecessary laod and stress on the relay chain,
    as well as new protocols for message exchange between collators.
  • Spawning a dedicated network backend for each parachain: Highly resource-intensive and doesn't scale
    well with the number of parachains.

By deploying a single network backend for the entire speculative messaging work, we keep the relay chain side
changes to a minimum (needed for JAM compatibility) and we can leverage the existing bootnodes on DHT
mechanism for collator discovery.

The Speculative Messaging Network exposes the following protocols:

  • Kademlia DHT: /spec-msg/kad for peer discovery.
  • Identify and Ping: /spec-msg/identify and /spec-msg/ping for obtaining peer addresses and keeping connections alive.
  • Speculative Messaging Protocol: /spec-msg/exchange for exchanging messages between collators.
  • Light Client Request-Response: /spec-msg/light/2 for fetching authority discovery keys of other collators.

Parachains outside a trust domain, or those that don't wish to participate can simply ignore the Speculative Messaging
Network and not register themselves in the DHT.

Bootnodes for the Speculative Messaging Network

The architecture leverages the existing bootnodes on DHT mechanism on the relay chain side.
For more info, see RFC 08.

Typically, relay chain peers of parachains advertise themselves as providers under the key para ID || epoch randomness
in the relay chain DHT. Only the 20 closest peer IDs to this key are kept as providers, and the provider set is updated on every epoch change.

Similarly, relay chain peers of collators advertise themselves as providers in the relay chain DHT.
This utilizes the ADD_PROVIDER mechanism of the Kademlia DHT.
The routing key is defined as sha256(concat("spec-msg", epoch randomness)), where the epoch randomness has the
same semantics of the RFC 08, and can be obtained by calling BabeApi_currentEpoch.

This extracts the relay chain side peer IDs of the 20 closest peers to the speculative messaging key.
To obtain the actual bootnode addresses, the /paranode request-response is extended in a backwards compatible way.
Originally, this request accepted a SCALE-compact-encoded para ID and returned a list of bootnode multiaddresses
for that parchain. The protocol is extended to support the SCALE-compact-encoded spec-msg key as input,
and the response is a list of multiaddresses of the collators that are bootnodes for the Speculative Messaging Network.

To obtain the bootnodes of the Speculative Messaging Network, a relay chain side peer:

  • Queries the DHT for providers under the key sha256(concat("spec-msg", epoch randomness)), obtaining 20 peer IDs
  • For each peerID, it sends a request-response over /paranode with the spec-msg key, and obtains a list of multiaddresses
    for the collators that are bootnodes for the Speculative Messaging Network.

Speculative Messaging Network

Once a collator obtains the bootnode list from the relay chain, it spawns a dedicated network backend for the
Speculative Messaging Network and connects to the bootnodes. Because the network connects collators from
all parachains, collators from Parachain A must establish communication with collators from Parachain B.

Peers register themself in the Speculative Messaging DHT as providers under the key para ID || randomness,
exactly as bootnodes on the relay chain DHT do using the ADD_PROVIDER mechanism.
This allows collators to quickly discover 20 closest peers. These peers serve as explicit entry points to
validate collators and fetch their authority discovery keys.

Separately to the ADD_PROVIDER mechanism, collators publish their SignedCollatorAuthorityRecord records into the DHT,
using the PUT_VALUE kademlia mechanism. This ensures collators can discover the addresses of other collators and verify their integrity, strengthening the trust model for collators.
This mechanism mirrors the authority discovery on the relay chain for validators.

The SignedCollatorAuthorityRecord record has the following format:

/// Collator record to provide public reachable addresses for the collator,
/// and the time of creation of the record.
pub struct CollatorAuthority {
    /// Parachain ID scale encoded.
    pub parachain_id: Vec<u8>,
    /// A vector of multiaddresses scale encoded.
    pub addresses: Vec<Vec<u8>>,
    /// The time since UNIX_EPOCH in nanoseconds, scale encoded.
    /// Similar to authority-discovery this is used to update peers that have
    /// stale records with newly discovered ones.
    pub creation_time: Vec<u8>,
}

/// The speculative messaging peer signs the `CollatorAuthority` record with their private key,
/// and includes the public key in the signature.
pub struct PeerSignature {
    /// The signature of the peer, scale encoded.
    pub signature: Vec<u8>,
    /// The public key of the peer, scale encoded.
    pub public_key: Vec<u8>,
}

/// Record published in the DHT.
pub struct SignedCollatorAuthorityRecord {
    /// The actual record containing the multiaddresses and creation time.
    pub record: CollatorAuthority,
    /// The signature of the peer over the record.
    pub peer_signature: PeerSignature,
    /// The record signed by the authority discovery key of the collator, scale encoded.
    pub auth_signature: Vec<u8>,
}

Trust Model for Collators

For Parachain A to securely exchange messages with Parachain B, it must first obtain Parachain B's discovery keys.
These keys allow Parachain A to map out collator addresses and verify peer integrity.

The SignedCollatorAuthorityRecord guarantees that communication stirctly happens with legitimate collators
of Parachain B, preventing eclipse attacks where malicious peers impersonate collators to drop or manipulate
messages.

Parachain A relies on light-client similar approach to fetch the discovery key from Parachain B:

    1. Read relay header: Parachain A reads Parachain B's header from the relay chain via paras::Heads::get(Para B). This storage entry is located at relay_well_known_keys::para_head(Para B).
    1. Extract state root: The header is decoded to obtain the state_root of the block.
    1. Craft storage key: We craft the key for the storage read twox_128("AuthorityDiscovery") ++ twox_128("Keys").
    1. Query peers: A request is made to the 20 closest peers that registered as providers under para ID || randomness key.
    1. Submit request: The request is submitted over /spec-msg/light/2 which includes RemoteReadRequest { block, keys } protobuf encoded.
    1. Receive proof: The response contains RemoteReadResponse containing a storage proof.
    1. Verify proof: Parachain A verified via read_proof_check(), passing in the state_root (step 2), crafted key (step 3), and the provided storage proof (step 6).

Once verified, parachain A knows the authority keys of parachain B and starts GET_VALUE kademlia requests to fetch the multiaddresses of the collators on the Speculative Messaging Network. With the multiaddresses, parachain A can establish direct communication with parachain B's collators over the /spec-msg/exchange protocol.

Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
@lexnv lexnv self-assigned this May 28, 2026
@paritytech-workflow-stopper
Copy link
Copy Markdown

All GitHub workflows were cancelled due to failure one of the required jobs.
Failed workflow url: https://github.com/paritytech/polkadot-sdk/actions/runs/26586053365
Failed job name: check-runtime-migration

Copy link
Copy Markdown
Member

@eskimor eskimor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General approach is very sensible. Some notes:

  1. We need to be able to verify messages - verify that the are coming from a block, authored by an actual authority and later also that it was acknowledged.
  2. While we should not over-engineer, some considerations on how this scales with many parachains would be good.
  3. Security & Speed considerations: Especially with regards to DHT.

collators of different parachains is not possible due to different genesis hashes and sync protocols.

To enable off-chain communicaiton between collators, a dedicated P2P network is created.
This **Speculative Messaging Network** includes collators from al parachains that opt into
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo. Other than that, I like the approach!


To obtain the bootnodes of the Speculative Messaging Network, a relay chain side peer:
- Queries the DHT for providers under the key `sha256(concat("spec-msg", epoch randomness))`, obtaining 20 peer IDs
- For each peerID, it sends a request-response over `/paranode` with the `spec-msg` key, and obtains a list of multiaddresses
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need separate entries? Can't we derive the endpoint from the already existing boot node entries?

Copy link
Copy Markdown
Contributor

@yrong yrong Jun 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, there is no separate spec-msg DHT registration. Instead, we reuse the existing para_id || epoch_randomness provider records that every parachain node already advertises under RFC 0008, avoiding the need to maintain a second sha256("spec-msg", ...) registration.

The only change is a backwards-compatible extension to the /paranode response:

  • The request remains unchanged and is still keyed solely by para_id.
  • The response gains an optional spec_msg_addrs field containing the node's /spec-msg/* listen addresses (or an empty value if the node does not participate in the spec-msg network).

Because the protocol uses proto2 semantics, older nodes simply ignore the additional field. As a result, no protocol version bump is required, and there is no need for a second lookup key or alternative input format to disambiguate registrations.

Please correct me if I'm wrong.

These keys allow Parachain A to map out collator addresses and verify peer integrity.

The `SignedCollatorAuthorityRecord` guarantees that communication stirctly happens with legitimate collators
of Parachain B, preventing eclipse attacks where malicious peers impersonate collators to drop or manipulate
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should get the messages via a standardized runtime API via a light-client and prove the correctness, with regards to a block authored by an actual authority. The peer we connect to should not be able to do anything worse than not serving. We can not trust the data, just because we authorized the peer - we need to verify.

Later we will also need to verify, not only that the messages are coming from an actual block, but also that this block was acknowledged. This will be a later phase, just mentioning it, in case it is relevant for making decisions: Light-client access & acknowledgment signature fetching needs to be possible and efficient.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that this implies two additional requirements:

  1. The sender's block header must be added to MessageBatch.
  2. The sender's Aura authority set must be fetched cross-chain and verified via a relay-anchored storage proof (via /spec-msg/light/2, similar to the audi discovery set is obtained).

The receiver can then verifies authorship of that included header by:

  • extracting the slot from the header;
  • fetch + verify the Aura authority set (relay-anchored proof);
  • derive the expected author = authorities[slot % authorities.len()];
  • verify the sr25519 seal over the pre-seal header hash against that expected author's public key.

My concern here is that it's substantially heavier than the current MMR self-check, and on the hottest speculative path it introduces non-trivial work. That may simply be the trade-off, however — the cost of enforcing the guarantee that a node cannot “do worse than refusing to serve”.

@yrong
Copy link
Copy Markdown
Contributor

yrong commented Jun 1, 2026

So, the cross-chain collator discovery reuses Substrate's existing authority-discovery — its record format, signing, and verification logic — and adds two pieces around it:

  • Shared DHT: run a separate NetworkService keyed by a fixed domain separator (instead of each chain's genesis), allowing all participating collators share a single DHT.
  • Cross-chain bootstrap: a sibling task learns a foreign chain's audi keys from a relay-anchored light-client proof of its AuthorityDiscovery::Keys, then does the normal DHT lookup for those keys.

Prerequisite: participating parachains run pallet-authority-discovery with collators under Audi, ensuring that authority-discovery keys are both published to the DHT and retrievable via state proofs.

Please correct me if I've misunderstood any part of this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants