You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The Storage Provider and Blockchain Provider have different database workload profiles but both need production-grade database selection and tuning. This issue tracks the evaluation of database engines and configuration for both components.
Related: Issue #100 (per-bucket database isolation) — the engine choice here directly informs the per-bucket architecture.
Current State
Storage Provider: RocksDB 0.22 with 3 column families (CF_NODES, CF_BUCKETS, CF_ROOT_TO_BUCKET). See provider-node/src/storage/disk.rs.
Blockchain Provider: Uses Substrate's default backend (RocksDB via sc-client-db).
Database Trade-Off Matrix
Architectural Vector
RocksDB (LSM-Tree)
ParityDB (Hash-Based)
Sled (B-Tree)
SQLite (WAL)
Primary Structure
Log-Structured Merge-Trees (SSTs)
Fixed-size value tables with hash-indexed mmap
Lock-free B-Tree
B-Tree with WAL journaling
Ideal Workloads
High-volume sequential reads/writes
High-frequency state trie lookups (32-byte keys)
Mixed read/write with transactional safety
Single-writer, many-reader with complex queries
Write Amplification
High (compaction rewrites data across levels)
Low (append-only)
Low-moderate
Low (WAL append)
Memory Management
Explicit app-level block caches
Implicit (OS page cache)
Implicit (OS page cache)
Configurable page cache
Rust Native
No (FFI to C++)
Yes
Yes
No (FFI to C)
Deletion Cost
High (tombstones trigger compaction)
Moderate
Moderate
Low (single file delete for per-bucket)
Maturity
Very high (Facebook, production-proven)
Moderate (Parity-maintained)
Moderate (community)
Very high (ubiquitous)
Scalability Bottlenecks to Evaluate
RocksDB Bottlenecks
SSD Compaction Wear: Continuous background compaction generates intense disk writes, reducing SSD lifespans and competing for write bandwidth.
FFI Context Switching: Interacting via FFI introduces CPU overhead noticeable during high-frequency operations.
Deletion Latency Spikes: Clearing expired agreements writes massive volumes of tombstones, triggering compactions that can delay other operations.
Native Rust (avoid FFI overhead across hundreds of instances)
Candidates to benchmark: Sled, SQLite (WAL), RocksDB (lightweight config)
Benchmark criteria:
Open/close latency (cold start for LRU pool eviction/reload)
Memory overhead per instance at rest and under load
Write throughput for MMR append operations (per-node-position key writes)
Read latency for MMR proof generation (random key lookups by position)
File descriptor usage at 100, 500, 1000 instances
Disk space efficiency (overhead per instance)
Bulk deletion speed (drop entire bucket DB)
Blockchain Provider (single instance, state trie)
The Blockchain Provider uses Substrate's storage backend for the state trie. Options:
RocksDB (current Substrate default): Mature, proven at scale, but compaction overhead.
ParityDB: Substrate-native alternative optimized for state trie access patterns. Lower write amplification.
Benchmark criteria:
Block import time under full state load
State trie read latency for storage maps (Providers, Buckets, StorageAgreements)
Compaction impact on block production
Memory usage under sustained testnet load
Pruning efficiency for historical state
Strategic Mitigation Plans (Applicable Regardless of Engine Choice)
Step 1: Physical Data Isolation
Decouple the Blockchain Provider's state tracking database from raw file storage. The consensus database must only store 32-byte hash pointers and state metrics.
Step 2: System Memory Safeguards
Isolate the Storage Provider's HTTP engine and file-sharing tasks within bounded OS containers (e.g., Linux cgroups). Prevent file transfer operations from purging the Blockchain Provider's consensus database indexes from RAM.
Step 3: Compaction & Buffer Tuning (if staying with RocksDB)
Configure leveled compaction with dynamic targets, expand active write buffers to cushion transaction spikes, and cap background write rates to protect disk I/O.
Step 4: Key Prefix Restructuring
Structure on-chain key pathways so entries sharing a common parent (like Bucket ID) are stored contiguously. Enables bulk deletions in a single pass.
Note: The original plan attributed key-prefix restructuring to Issue #65, but actual Issue #65 is "Robust Syncing Protocol for Dynamic Primary and Replica Node Topologies" — a different topic entirely.
Alternative Engines (Evaluate Only If Primary Candidates Fail)
PebblesDB (Fragmented LSM Tree): Groups keys into fragments, reducing write amplification ~70%. Alternative for Blockchain Provider if RocksDB compaction thrashes disks.
BadgerDB (Key-Value Separated LSM): Separates keys from values, compacts only the key index. Optimized for variable-sized metadata and proof structures.
Deliverables
Benchmark report comparing Sled vs SQLite vs RocksDB for per-bucket Storage Provider DBs
Benchmark report comparing RocksDB vs ParityDB for Blockchain Provider state trie
Recommendation with justification for each component
Configuration guide for the chosen engines (memory limits, compaction settings, OS tuning)
Migration plan from current single-RocksDB to chosen architecture
Context
The Storage Provider and Blockchain Provider have different database workload profiles but both need production-grade database selection and tuning. This issue tracks the evaluation of database engines and configuration for both components.
Related: Issue #100 (per-bucket database isolation) — the engine choice here directly informs the per-bucket architecture.
Current State
CF_NODES,CF_BUCKETS,CF_ROOT_TO_BUCKET). Seeprovider-node/src/storage/disk.rs.sc-client-db).Database Trade-Off Matrix
Scalability Bottlenecks to Evaluate
RocksDB Bottlenecks
BucketStateserializes all MMR leaves as one value — O(n) on every access (see Issue Evaluate per-bucket database isolation on Storage Provider #100).ParityDB Considerations
Sled Considerations
SQLite (WAL mode) Considerations
Evaluation for Each Component
Storage Provider (per-bucket DBs — Issue #100)
The per-bucket architecture (Issue #100) changes the requirements: instead of one large DB, we need many small, independent DBs. This favors:
Candidates to benchmark: Sled, SQLite (WAL), RocksDB (lightweight config)
Benchmark criteria:
Blockchain Provider (single instance, state trie)
The Blockchain Provider uses Substrate's storage backend for the state trie. Options:
Benchmark criteria:
Strategic Mitigation Plans (Applicable Regardless of Engine Choice)
Step 1: Physical Data Isolation
Decouple the Blockchain Provider's state tracking database from raw file storage. The consensus database must only store 32-byte hash pointers and state metrics.
Step 2: System Memory Safeguards
Isolate the Storage Provider's HTTP engine and file-sharing tasks within bounded OS containers (e.g., Linux cgroups). Prevent file transfer operations from purging the Blockchain Provider's consensus database indexes from RAM.
Step 3: Compaction & Buffer Tuning (if staying with RocksDB)
Configure leveled compaction with dynamic targets, expand active write buffers to cushion transaction spikes, and cap background write rates to protect disk I/O.
Step 4: Key Prefix Restructuring
Structure on-chain key pathways so entries sharing a common parent (like Bucket ID) are stored contiguously. Enables bulk deletions in a single pass.
Note: The original plan attributed key-prefix restructuring to Issue #65, but actual Issue #65 is "Robust Syncing Protocol for Dynamic Primary and Replica Node Topologies" — a different topic entirely.
Alternative Engines (Evaluate Only If Primary Candidates Fail)
Deliverables