Problem Statement
As the web3-storage network scales, the current architecture faces significant challenges in maintaining data consistency during topology changes. Specifically, when the set of Primary or Replica nodes is dynamic (e.g., adding nodes, migrating providers, or recovering from downtime), there is no formalized protocol to ensure that state and data remain synchronized across the cluster.
Without a robust syncing mechanism, new nodes enter a "data gap" state where they are unable to participate in consensus or fulfill retrieval requests for historical data.
Syncing Scenarios & Requirements
We have identified four critical sync vectors that need to be addressed:
1. Primary $\leftrightarrow$ Primary Synchronization
When a new Primary node is introduced to increase write-throughput or redundancy:
- Challenge: The new node must ingest the current state of storage proofs and indexing metadata without halting the network's ability to process new incoming writes.
- Goal: Achieve state-parity among all authoritative write-nodes.
2. Primary $\rightarrow$ Replica Synchronization
When scaling the read-layer or adding a new replica:
- Challenge: Replicas need to pull large blobs of data from Primaries. This creates high bandwidth pressure on the Primary nodes.
- Goal: Implement a streaming transfer protocol that supports resume-on-failure.
3. Intra-Replica (P2P) Synchronization
To optimize network health:
- Challenge: Replicas should be able to sync from other nearby Replicas rather than always hitting the Primary layer.
- Goal: Reduce the $O(n)$ load on Primary nodes where $n$ is the number of replicas.
4. Provider Migration
When a provider node is decommissioned or replaced:
- Challenge: Handing over "ownership" of data segments and their associated cryptographic proofs to a new node.
Technical Analysis & Proposed Approach
Data Discovery and Identification
We should leverage libp2p's discovery mechanisms to identify "sync-capable" peers. Data should be identified via Content Identifiers (CIDs) to ensure that the data received is exactly what was requested.
Incremental State Transfer
Instead of full state downloads, we propose a "Delta-Sync" approach:
- Snapshots: Primary nodes maintain periodic snapshots of the state.
- Gossip Logs: Use libp2p-gossipsub to broadcast recent changes that occurred after the last snapshot.
- Catch-up: New nodes download the latest snapshot and then replay the gossip logs.
Verification Logic
Every synced segment must be verified against the on-chain (or shared ledger) root hash.
- Proof-of-Sync: A mechanism where a node can cryptographically prove it has completed a sync before it is marked as "Active" in the node registry.
Questions for Maintainers
- Is there a preferred architectural preference between a pull-based (new node requests data) vs. push-based (Primary broadcasts to new nodes) sync model?
- Should the sync logic reside within the core pallet logic or as a separate networking service?
- Are there existing benchmarks for the expected volume of data per provider that the sync protocol should be optimized for?
Problem Statement
As the web3-storage network scales, the current architecture faces significant challenges in maintaining data consistency during topology changes. Specifically, when the set of Primary or Replica nodes is dynamic (e.g., adding nodes, migrating providers, or recovering from downtime), there is no formalized protocol to ensure that state and data remain synchronized across the cluster.
Without a robust syncing mechanism, new nodes enter a "data gap" state where they are unable to participate in consensus or fulfill retrieval requests for historical data.
Syncing Scenarios & Requirements
We have identified four critical sync vectors that need to be addressed:
1. Primary$\leftrightarrow$ Primary Synchronization
When a new Primary node is introduced to increase write-throughput or redundancy:
2. Primary$\rightarrow$ Replica Synchronization
When scaling the read-layer or adding a new replica:
3. Intra-Replica (P2P) Synchronization
To optimize network health:
4. Provider Migration
When a provider node is decommissioned or replaced:
Technical Analysis & Proposed Approach
Data Discovery and Identification
We should leverage libp2p's discovery mechanisms to identify "sync-capable" peers. Data should be identified via Content Identifiers (CIDs) to ensure that the data received is exactly what was requested.
Incremental State Transfer
Instead of full state downloads, we propose a "Delta-Sync" approach:
Verification Logic
Every synced segment must be verified against the on-chain (or shared ledger) root hash.
Questions for Maintainers