Evaluate per-bucket database isolation on Storage Provider

## Problem Statement

The current Storage Provider uses a single RocksDB instance with 3 column families (`CF_NODES`, `CF_BUCKETS`, `CF_ROOT_TO_BUCKET`) for all buckets. This creates a **structural scaling bottleneck** driven primarily by how MMR state is stored and rebuilt per bucket.

### The Core Problem: MMR Rebuild Is O(n) Per Operation

`BucketState` stores all MMR leaves as a `Vec<MmrLeaf>` serialized into a single value in `CF_BUCKETS`. Every `commit()` and `get_mmr_proof()` call:

1. Deserializes the entire bucket state (all leaves) from one RocksDB key
2. **Rebuilds the entire MMR from scratch** by replaying all leaves
3. Appends new leaves (for commits)
4. Reserializes everything back to disk

```rust
// This runs on EVERY commit and EVERY proof generation (disk.rs:351-356, 436-438)
let mut mmr = crate::mmr::Mmr::new();
for leaf in &bucket.leaves {
    mmr.push(blake2_256(&leaf.encode()));
}
```

For a bucket with 100,000 leaves, every single upload replays 100,000 hashes. The serialized `BucketState` blob grows linearly and unboundedly — at scale this becomes a multi-MB single-key value that gets deserialized/reserialized on every operation.

### Secondary Problems

**Cross-bucket contention on the write path**: All bucket commits go through the same RocksDB write buffer. A burst of commits to Bucket A blocks Bucket B's checkpoint signature response, which has a 48-hour challenge deadline.

**Challenge response latency**: `get_mmr_proof()` also rebuilds the entire MMR just to generate one proof. Under load, generating proofs for large buckets competes with all other bucket operations for the same DB resources.

**No tenant isolation**: Storage and query spikes from one client's bucket affect all other buckets on the same provider.

**Expensive bucket deletion**: Deleting an expired bucket requires writing individual tombstone markers for every key, triggering RocksDB compaction cycles that compete with active storage operations.

**Migration difficulty**: Moving a single bucket to another provider requires scanning and extracting its data from the shared database — there's no self-contained unit to transfer.

## Directional Solution: One Database Per Bucket

### Architecture

Each bucket gets its own isolated database instance alongside a node-wide global index:

```
Storage Provider
├── Global Index (single lightweight DB)
│   ├── Bucket registry (bucket_id → DB path, status, stats)
│   ├── Deduplication index (content_hash → reference count)
│   └── Node-wide metrics and telemetry
│
├── Bucket DB: bucket_001/ (self-contained)
│   ├── MMR nodes (one key per node position — no replay needed)
│   ├── MMR leaves (individually addressable)
│   ├── Chunk data (content-addressed)
│   └── Bucket metadata (root, seq, config)
│
├── Bucket DB: bucket_002/ (self-contained)
│   └── ...
└── ...
```

### Key Benefit: Persistent MMR Structure

With per-bucket DBs, MMR nodes can be stored individually (one key per node position) instead of replaying from a serialized leaf vec:

- `commit()` becomes O(log n) — append new leaves and their parent nodes, no replay
- `get_mmr_proof()` becomes O(log n) — walk the tree by position, no rebuild
- `get_mmr_peaks()` becomes O(log n) — read peak positions directly

### Other Benefits

- **Compaction-free bucket deletion**: Delete the entire database directory instead of writing tombstones
- **Complete tenant isolation**: One bucket's I/O spike cannot affect another bucket's checkpoint or challenge response
- **Easy bucket migration**: Copy the self-contained database directory to move a bucket between providers
- **Independent tuning**: Hot buckets can have different cache/buffer settings than cold ones

### Known Scaling Gaps

- **File descriptor exhaustion**: Thousands of open databases can exceed OS `ulimit -n`
- **Memory overhead**: Each DB instance has its own memory-mapped buffers
- **Loss of global deduplication**: Identical chunks across buckets get stored twice

### Proposed Mitigations

1. **LRU connection pool**: Cap concurrent open DBs. Evict least-recently-used bucket DBs from the pool. Reopen on next access. This bounds file descriptors and memory.
2. **Lightweight engines**: Evaluate SQLite (WAL mode) or Sled per bucket — single-file, minimal footprint, native Rust (Sled).
3. **Dual-DB architecture**: Global Index handles deduplication references and node-wide metadata; per-bucket DBs handle bucket-specific MMR state and chunk mappings.
4. **OS limit tuning**: Adjust `ulimit -n` and kernel parameters for providers running at scale.

## Investigation Scope

- [ ] Benchmark current single-RocksDB performance: measure `commit()` and `get_mmr_proof()` latency as leaf count grows (100, 1K, 10K, 100K leaves)
- [ ] Prototype persistent per-node MMR storage (one key per MMR position) — measure O(log n) vs current O(n) replay
- [ ] Prototype per-bucket SQLite/Sled instances with LRU connection pool
- [ ] Measure file descriptor usage and memory overhead at 100, 500, 1000 bucket scale
- [ ] Evaluate deduplication loss (how much storage overhead from duplicate chunks across buckets?)
- [ ] Define the Global Index schema (what metadata is node-wide vs. bucket-local?)
- [ ] Compare deletion performance: single-DB tombstone compaction vs. per-bucket directory removal
- [ ] Determine migration path from current single-RocksDB to per-bucket architecture

## Current Implementation Reference

- MMR implementation: `provider-node/src/mmr.rs` (in-memory `Vec<H256>`, rebuilt from leaves)
- Disk storage: `provider-node/src/storage/disk.rs` (RocksDB 0.22, 3 column families)
- BucketState: `provider-node/src/storage/disk.rs:22-29` (stores `Vec<MmrLeaf>` as single serialized blob)
- Commit path: `provider-node/src/storage/disk.rs:318-383` (full MMR replay on every commit)
- Proof generation: `provider-node/src/storage/disk.rs:420-450` (full MMR replay for each proof)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evaluate per-bucket database isolation on Storage Provider #100

Problem Statement

The Core Problem: MMR Rebuild Is O(n) Per Operation

Secondary Problems

Directional Solution: One Database Per Bucket

Architecture

Key Benefit: Persistent MMR Structure

Other Benefits

Known Scaling Gaps

Proposed Mitigations

Investigation Scope

Current Implementation Reference

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Evaluate per-bucket database isolation on Storage Provider #100

Description

Problem Statement

The Core Problem: MMR Rebuild Is O(n) Per Operation

Secondary Problems

Directional Solution: One Database Per Bucket

Architecture

Key Benefit: Persistent MMR Structure

Other Benefits

Known Scaling Gaps

Proposed Mitigations

Investigation Scope

Current Implementation Reference

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions