CLAUDE.md - Scalable Web3 Storage

Agent Rules

Git commit rules:

NEVER add Co-Authored-By lines to commits
NEVER use git rebase
NEVER use git push --force or git push -f

Automatic formatting:

ALWAYS run /format after generating or modifying Rust code
ALWAYS run /format before creating any git commit
This ensures all code follows project formatting standards (Rust, TOML, feature propagation) and passes clippy

Project Overview

Scalable Web3 Storage is a decentralized storage system built on Substrate with game-theoretic guarantees. Storage providers lock stake and face slashing for data loss, while the chain acts as a credible threat rather than the hot path.

Architecture: Two-node system where blockchain handles accountability and provider nodes handle actual storage:

Parachain Node: On-chain logic for stake, agreements, checkpoints, and challenges
Provider Node: Off-chain HTTP server for data upload, download, and MMR commitment

Key Purpose: Enable trustless storage where normal operations (reads, writes) happen off-chain via HTTP, and the chain is only touched for setup, checkpoints, and disputes.

Build Commands

# Build everything (release)
cargo build --release

# Build specific components
cargo build --release -p storage-parachain-runtime
cargo build --release -p storage-provider-pallet
cargo build --release -p storage-provider-node
cargo build --release -p storage-client

# Build with runtime benchmarks
cargo build --release --features runtime-benchmarks

# Using just (recommended)
just build

Test Commands

# Run all tests
cargo test

# Run pallet tests
cargo test -p storage-provider-pallet

# Run provider node tests
cargo test -p storage-provider-node

# Run client SDK tests
cargo test -p storage-client

# Run file system tests (Layer 1)
cargo test -p file-system-primitives
cargo test -p pallet-drive-registry
cargo test -p file-system-client

# Or test all file system components at once
just fs-test-all

# Run integration tests (require chain + provider already running)
just start-chain     # Terminal 1
just start-provider  # Terminal 2
just demo            # Terminal 3 — Layer-0 PAPI flow
just fs-demo-ci      # Terminal 3 — Layer-1 file-system flow
just s3-demo-ci      # Terminal 3 — Layer-1 S3 flow

# Clippy linting
cargo clippy --all-targets --all-features --workspace -- -D warnings

Formatting

# Rust formatting (requires nightly)
cargo +nightly fmt --all

# TOML formatting
taplo format --check --config .config/taplo.toml

# Feature propagation lint (checks Cargo.toml feature gates)
zepter run --config .config/zepter.yaml

Run Commands

# One-time setup (downloads binaries, builds project)
just setup

# Start blockchain
just start-chain

# Start provider node manually
just start-provider

# Check provider health
just health

# Check chain health (relay + parachain + current block)
bash scripts/check-chain.sh

# Run end-to-end PAPI demo (setup, upload, 2 challenges)
just demo

Running the UIs locally

When the user says "run locally" (or "run the UIs", "start the UIs", "spin up the UIs"), invoke the run-local-uis project skill — it starts all four user-interfaces/ apps on their canonical ports with Vite HMR, including the landing page (which needs a custom dev config to substitute its build-time placeholders and rewrite card links). Canonical ports: landing 5176, console-ui 5173, drive-ui 5174, provider 5175.

File System (Layer 1) Commands

The File System Interface provides a high-level abstraction over Layer 0's raw blob storage.

# Test all file system components (primitives + pallet + client)
just fs-test-all

# Run integration example against a running chain + provider node
just fs-demo-ci

# Manually run the basic_usage example
cargo run -p file-system-client --example basic_usage

Quick Start Guide: FILE_SYSTEM_QUICKSTART.md

Complete Documentation: docs/filesystems/README.md

JS/TS: use `polkadot-api`, never `@polkadot/*`

For any JavaScript or TypeScript code in this repo (demos, scripts, tooling, future SDKs), talk to the chain through polkadot-api (PAPI). Do NOT introduce @polkadot/keyring, @polkadot/util-crypto, @polkadot/util, @polkadot/api, or any other @polkadot/* package. They duplicate functionality PAPI already provides, drag in 20+ transitive deps, and force cryptoWaitReady() awaits everywhere. Use these instead:

Need	Use
Chain client + typed API	`polkadot-api` (`createClient`, `getWsProvider` from `polkadot-api/ws-provider`)
Signer wrapper	`getPolkadotSigner` from `polkadot-api/signer`
SCALE / `Binary` / `Enum`	`@polkadot-api/substrate-bindings`
Sr25519 key derivation (`//Alice`)	`sr25519CreateDerive` from `@polkadot-labs/hdkd` + `DEV_PHRASE` + `entropyToMiniSecret` + `mnemonicToEntropy` from `@polkadot-labs/hdkd-helpers`
SS58 encode / decode	`ss58Address` / `ss58Decode` from `@polkadot-labs/hdkd-helpers`
blake2-256 hashing	`blake2b256` from `@polkadot-labs/hdkd-helpers`
`cryptoWaitReady()`	Not needed — hdkd is synchronous; delete the import and the await

Canonical signer/derive pattern — set up the derive function once at module load, then call makeSigner("//Alice") etc.:

import { createClient } from "polkadot-api";
import { getWsProvider } from "polkadot-api/ws-provider";
import { getPolkadotSigner } from "polkadot-api/signer";
import { sr25519CreateDerive } from "@polkadot-labs/hdkd";
import {
  DEV_PHRASE,
  entropyToMiniSecret,
  mnemonicToEntropy,
  ss58Address,
  ss58Decode,
} from "@polkadot-labs/hdkd-helpers";

const devMiniSecret = entropyToMiniSecret(mnemonicToEntropy(DEV_PHRASE));
const deriveSr25519 = sr25519CreateDerive(devMiniSecret);

export function makeSigner(seed) {
  const keyPair = deriveSr25519(seed); // seed is a SURI path like "//Alice"
  return {
    signer: getPolkadotSigner(keyPair.publicKey, "Sr25519", keyPair.sign),
    address: ss58Address(keyPair.publicKey), // prefix 42 (`5…`), same as @polkadot/keyring default
    publicKey: keyPair.publicKey,
    seed,
  };
}

ss58Address defaults to substrate prefix 42 (5…) while PAPI surfaces accounts with the runtime SS58 prefix (Polkadot-style 1… on this parachain) — same key, different string, so string equality fails. Compare raw bytes via ss58Decode:

// ss58Decode(addr) → [bytes, prefix]
export function sameAddress(a, b) {
  try {
    const [aBytes] = ss58Decode(a);
    const [bBytes] = ss58Decode(b);
    if (aBytes.length !== bBytes.length) return false;
    for (let i = 0; i < aBytes.length; i++) {
      if (aBytes[i] !== bBytes[i]) return false;
    }
    return true;
  } catch {
    return false;
  }
}

Architecture

Directory Structure

web3-storage/
├── pallet/                     # Substrate pallet (on-chain logic - Layer 0)
│   ├── src/lib.rs             # Core pallet implementation
│   └── Cargo.toml             # Pallet dependencies
├── runtime/                    # Parachain runtime
│   ├── src/lib.rs             # Runtime configuration
│   └── Cargo.toml             # Runtime dependencies
├── provider-node/              # Off-chain HTTP storage server
│   ├── src/                   # Provider implementation
│   │   ├── main.rs           # Server entry point
│   │   ├── storage.rs        # Storage layer
│   │   └── mmr.rs            # MMR commitment logic
│   └── Cargo.toml            # Provider dependencies
├── client/                     # Layer 0 Client SDK
│   ├── src/                   # SDK implementation
│   │   ├── lib.rs            # Main client API
│   │   └── types.rs          # Client types
│   ├── examples/             # Usage examples
│   └── README.md             # SDK documentation
├── primitives/                 # Layer 0 shared types and utilities
│   ├── src/lib.rs            # Common types
│   └── Cargo.toml            # Primitive dependencies
├── storage-interfaces/         # Layer 1 - High-level interfaces
│   └── file-system/           # File System Interface
│       ├── primitives/        # File system types (DriveInfo, CommitStrategy, etc.)
│       ├── pallet-registry/   # Drive Registry pallet (on-chain)
│       └── client/            # File System Client SDK
│           ├── src/
│           │   ├── lib.rs     # Main file system client
│           │   └── substrate.rs # Blockchain integration (subxt)
│           ├── examples/
│           │   └── basic_usage.rs # Complete workflow example
│           └── README.md      # File system client docs
├── scripts/                    # Helper scripts
│   ├── build-chain-spec.sh   # Build runtime + emit chain spec (used by `just generate-chain-spec`)
│   ├── check-chain.sh        # Relay + parachain health probe
│   └── quick-test.sh         # Curl-based smoke test of provider HTTP API
├── chain-specs/                # Chain specification files
├── docs/                       # Documentation
│   ├── README.md             # Documentation index
│   ├── getting-started/      # Quick start guides
│   ├── testing/              # Testing procedures
│   ├── reference/            # API references
│   ├── design/               # Architecture docs
│   └── filesystems/          # Layer 1 File System docs
│       ├── README.md         # File system overview
│       ├── ARCHITECTURE.md   # Encoding, security, chain integration
│       ├── USER_GUIDE.md     # User guide
│       ├── API_REFERENCE.md  # API documentation
│       └── ADMIN_GUIDE.md    # Admin guide
├── FILE_SYSTEM_QUICKSTART.md  # Quick start for file system
└── justfile                    # Development commands

Key Components

Layer 0 (Raw Storage)

Pallet (pallet/): On-chain logic for provider registration, bucket creation, storage agreements, checkpoints, and challenge/slashing mechanism.

Runtime (runtime/): Parachain runtime that includes the storage provider pallet and configures its parameters (stake requirements, challenge periods, etc.).

Provider Node (provider-node/): Off-chain HTTP server that:

Stores data chunks locally
Builds MMR commitments
Serves data via HTTP API
Signs checkpoints for on-chain submission

Client SDK (client/): Rust library for applications to:

Create buckets and agreements (on-chain)
Upload/download data (off-chain HTTP)
Submit checkpoints (on-chain)
Challenge providers (on-chain)

Primitives (primitives/): Shared types used across pallet, provider node, and client.

Layer 1 (File System Interface)

File System Primitives (storage-interfaces/file-system/primitives/): High-level types for file system:

DriveInfo: Drive metadata and configuration
DirectoryNode: Protobuf-based directory structure
FileManifest: File metadata with chunk tracking
CommitStrategy: Checkpoint strategies (Immediate, Batched, Manual)
Helper functions for CID computation and path handling

Drive Registry Pallet (storage-interfaces/file-system/pallet-registry/): On-chain drive management:

Drive creation with automatic infrastructure setup
Root CID tracking for drive state
User-to-drive mapping
Bucket-to-drive mapping
Drive lifecycle (create, update, clear, delete)

File System Client (storage-interfaces/file-system/client/): High-level SDK providing:

Familiar file/folder interface over Layer 0 blob storage
Automatic drive creation and provider selection
Directory operations (create, list, navigate)
File operations (upload, download, delete)
Real blockchain integration using subxt
Content-addressed storage with CID verification
Flexible commit strategies

Example: storage-interfaces/file-system/client/examples/basic_usage.rs

Complete workflow: drive creation → directories → file uploads/downloads
Real blockchain integration with event extraction
Demonstrates the full Layer 1 capabilities

Development Workflow

Quick Start

Setup: just setup (one-time, downloads binaries and builds)
Start: just start-chain then just start-provider (in separate terminals)
Configure: with chain + provider running, just demo registers the provider, opens an agreement, and exercises challenges end-to-end (it does not start the chain or provider for you)
Test: just demo

Development Cycle

Format code: cargo fmt --all
Run clippy: cargo clippy --all-targets --all-features --workspace
Run tests: cargo test
Build: cargo build --release or just build

Local Testing with Zombienet

The project uses Zombienet for local relay chain + parachain testing:

# Start network (relay chain + parachain)
just start-chain

# Or manually:
.bin/zombienet spawn zombienet.toml

Network URLs:

Relay chain: ws://127.0.0.1:9900
Parachain: ws://127.0.0.1:2222
Provider HTTP: http://localhost:3333

Web UI:

Relay chain: https://polkadot.js.org/apps/?rpc=ws://127.0.0.1:9900
Parachain: https://polkadot.js.org/apps/?rpc=ws://127.0.0.1:2222

Polkadot SDK (Upstream)

This project is built on the Polkadot SDK (formerly Substrate). For deeper understanding of FRAME pallets, runtime macros, and consensus:

Repository: https://github.com/paritytech/polkadot-sdk
Documentation: https://paritytech.github.io/polkadot-sdk/

The Polkadot SDK provides:

FRAME pallet system and runtime macros
Parachain consensus (Cumulus)
Networking (libp2p)
RPC infrastructure
XCM (Cross-Consensus Messaging)

Dependencies

Polkadot SDK: See Cargo.toml workspace dependencies
Rust: 1.74+ with wasm32-unknown-unknown target
Just: Command runner (cargo install just)
Zombienet: Network spawner (auto-downloaded by just setup)
Polkadot: Relay chain binary (auto-downloaded)
Polkadot Omni Node: Parachain node (auto-downloaded)

Configuration

Runtime Parameters (runtime/src/lib.rs)

// Token decimals
pub const UNIT: Balance = 1_000_000_000_000; // 12 decimals

// Minimum provider stake: 1000 tokens
pub const MinProviderStake: Balance = 1_000 * UNIT;

// 1 token (1e12) per 1 GB (1e9 bytes) = 1000 per byte
pub const MinStakePerByte: Balance = 1_000;

// Challenge response deadline (provider must respond within this many blocks)
pub const ChallengeTimeout: BlockNumber = 48 * HOURS;
pub const SettlementTimeout: BlockNumber = 24 * HOURS;
pub const RequestTimeout: BlockNumber = 6 * HOURS;

// Provider-initiated checkpoint config
pub const DefaultCheckpointInterval: BlockNumber = 100;
pub const DefaultCheckpointGrace: BlockNumber = 20;
pub const CheckpointReward: Balance = 1_000_000_000_000;     // 1 token
pub const CheckpointMissPenalty: Balance = 500_000_000_000;  // 0.5 token

Provider Settings (configured per provider)

pub struct ProviderSettings {
    min_duration: BlockNumber,        // Minimum agreement duration
    max_duration: BlockNumber,        // Maximum agreement duration
    price_per_byte: Balance,          // Price per byte per block
    accepting_primary: bool,          // Accepting new agreements
    replica_sync_price: Option<Balance>, // Price for replica sync
    accepting_extensions: bool,       // Accepting agreement extensions
    max_capacity: u64,                // Maximum storage capacity (0 = unlimited)
}

Capacity & Stake Requirements

Providers must stake tokens proportional to their declared capacity:

// Minimum stake per byte of declared capacity
pub const MinStakePerByte: Balance = 1_000_000; // 1 unit per MB

// Required stake calculation
required_stake = max_capacity * MinStakePerByte

// Example: 1 TB capacity requires 1,000,000,000,000 units stake

Key Concepts

Storage Flow

Setup (on-chain):
- Provider registers with stake
- Client creates bucket
- Agreement established
Storage (off-chain):
- Client uploads chunks via HTTP to provider
- Provider stores and builds MMR commitment
Checkpoint (on-chain):
- Provider signs MMR root
- Client submits checkpoint
- Provider now liable for data
Verification (off-chain):
- Client spot-checks chunks
- Client can download anytime
Dispute (on-chain, rare):
- Client submits challenge
- Provider must provide proof or get slashed

MMR (Merkle Mountain Range)

The provider builds an MMR over stored chunks:

Each upload adds a leaf to the MMR
MMR root represents commitment to all data
Efficient proofs for individual chunks
Enables challenge mechanism

Payment Calculation

payment = price_per_byte × max_bytes × duration

Example:

price_per_byte = 1,000,000
max_bytes = 1,073,741,824 (1 GB)
duration = 500 blocks
payment = 536,870,912,000,000,000

Set maxPayment with 10-20% buffer to account for price changes.

Advanced Features

Provider Discovery & Marketplace

The SDK provides automatic provider discovery based on storage requirements:

use storage_client::{DiscoveryClient, StorageRequirements};

let mut client = DiscoveryClient::with_defaults()?;
client.connect().await?;

// Define requirements
let requirements = StorageRequirements {
    bytes_needed: 10 * 1024 * 1024 * 1024, // 10 GB
    min_duration: 100_000,
    max_price_per_byte: 1_000_000,
    primary_only: true,
};

// Find matching providers (sorted by score)
let providers = client.find_providers(requirements, 10).await?;

// Or get recommendations with cost estimates
let recommendations = client.suggest_providers(bytes, duration, budget).await?;

Matching Algorithm: Providers are scored 0-100 based on:

Accepting status (not accepting = 0)
Capacity (insufficient = -50 points)
Price (too high = -30 points)
Duration (mismatch = -20 points)

See Storage Marketplace Design for details.

Checkpoint Management

The client SDK provides comprehensive checkpoint management:

use storage_client::{CheckpointManager, CheckpointConfig, BatchedCheckpointConfig};

// Create checkpoint manager
let manager = CheckpointManager::new(chain_endpoint, CheckpointConfig::default()).await?;
let manager = manager.with_providers(provider_endpoints);

// Manual checkpoint submission
let result = manager.submit_checkpoint(bucket_id).await;

// Or enable automatic checkpoints
let config = BatchedCheckpointConfig {
    interval: BatchedInterval::Blocks(100),
    ..Default::default()
};
let handle = manager.start_checkpoint_loop(bucket_id, config, callback).await?;

// Control the loop
handle.submit_now().await?;  // Force immediate checkpoint
handle.stop().await?;         // Stop background loop

Key Components:

CheckpointManager: Coordinates multi-provider checkpoint collection and consensus
CheckpointPersistence: Persists checkpoint state to disk with backup rotation
EventSubscriber: Real-time blockchain event monitoring (checkpoints, challenges)
ProviderHealthHistory: Tracks provider reliability and response times

See Checkpoint Protocol Design for details.

Event Subscription

Subscribe to real-time blockchain events:

use storage_client::{EventSubscriber, EventFilter, StorageEvent};

let subscriber = EventSubscriber::connect(chain_endpoint).await?;

// Subscribe to specific events
let filter = EventFilter::bucket(bucket_id);
let mut stream = subscriber.subscribe(filter).await?;

while let Some(event) = stream.next().await {
    match event {
        StorageEvent::BucketCheckpointed { bucket_id, mmr_root, .. } => { /* ... */ }
        StorageEvent::ChallengeCreated { challenge_id, .. } => { /* ... */ }
        StorageEvent::ProviderSlashed { provider, amount, .. } => { /* ... */ }
        _ => {}
    }
}

Code Review Guidelines (Parity Standards)

For the full review criteria (Parity Standards), see the /review skill. The review bot and all contributors follow those guidelines.

Rust Code Quality

Error Handling: Use Result types with meaningful error enums. Avoid unwrap() and expect() in production code; they are acceptable in tests.
Arithmetic Safety: Use checked_*, saturating_*, or wrapping_* arithmetic to prevent overflow. Never use raw arithmetic operators on user-provided values.
Naming: Follow Rust naming conventions (snake_case for functions/variables, CamelCase for types).
Complexity: Prefer simple, readable code. Avoid over-engineering and premature abstractions.
No useless comments: Comments should mostly explain why things are done, not how. The code should be readable enough to explain the how.

FRAME Pallet Standards

Storage: Use appropriate storage types (StorageValue, StorageMap, StorageDoubleMap, CountedStorageMap).
Events: Emit events for all state changes that external observers need to track.
Errors: Define descriptive error types in the pallet's Error enum.
Weights: All extrinsics must have accurate weight annotations. Update benchmarks when logic changes.
Origins: Use the principle of least privilege for origin checks.
Hooks: Be cautious with on_initialize and on_finalize; they affect block production time. Never panic or do unbounded iteration in them. Always benchmark them properly.

Security Considerations

No Panics in Runtime: Runtime code must never panic. Use defensive programming with defensive_* macros.
Bounded Collections: Use BoundedVec, BoundedBTreeMap etc. to prevent unbounded storage growth.
Input Validation: Validate all user inputs at the entry point.
Storage Deposits: Consider requiring deposits for user-created storage items.
Arithmetic: Always use checked arithmetic for financial calculations.
Access Control: Verify origin permissions before state changes.

Testing Requirements

Unit Tests: All new functionality requires unit tests.
Edge Cases: Test boundary conditions, error paths, and malicious inputs.
Integration Tests: Complex features should have integration tests.
Mock Tests: Use mock.rs and TestExternalities for pallet tests.
Provider Node Tests: Test HTTP API endpoints and storage layer.
Client SDK Tests: Test all public SDK methods.

PR Requirements

Single Responsibility: Each PR should address one concern.
Tests Pass: All CI checks must pass (cargo test, cargo clippy, cargo fmt).
No Warnings: Code should compile without warnings.
Documentation: Public APIs require rustdoc comments.
Changelog: Update changelog for user-facing changes.

Documentation

📚 Complete Documentation - Full documentation index

Quick Links

Document	Description
Layer 1 Quick Start	Three-terminal setup + SDK examples
Extrinsics Reference	Complete blockchain API
Payment Calculator	Calculate agreement costs
Architecture Design	System design, economics, common concerns
Implementation Details	Technical specs
Execution Flows	Sequence diagrams for all extrinsics
Storage Marketplace	Provider capacity & discovery
Checkpoint Protocol	Automated checkpoint management
File System Architecture	Layer 1 encoding, security, blockchain details

Common Issues & Solutions

"Insufficient Stake" Error

Minimum required: 1000 tokens = 1000000000000000 (12 decimals)
Check Alice's balance in Accounts tab

"PaymentExceedsMax" Error

Calculate payment: price_per_byte × max_bytes × duration
Set maxPayment with 10-20% buffer
See Payment Calculator

Upload Fails

Complete on-chain setup first: register provider, create bucket, establish agreement
With chain + provider already running, just demo performs that setup
For chain health, run bash scripts/check-chain.sh (relay + parachain probe)

Provider Not Accepting Agreements

Call updateProviderSettings after registration
Set acceptingPrimary: true

"CapacityExceeded" or "InsufficientStakeForCapacity" Error

Provider's max_capacity is too low for the agreement
Or provider's stake doesn't cover their declared capacity
Required: stake >= max_capacity * MinStakePerByte
Use DiscoveryClient.find_providers() to find providers with sufficient capacity

Feature Flags

runtime-benchmarks - Enable weight generation
try-runtime - Runtime migration testing
std - Standard library features (default)

Notes

Token decimals: 12 (like Polkadot)
Minimum stake: 1000 tokens
Challenge period: 100 blocks
Data is content-addressed with blake2-256
All data operations happen off-chain via HTTP
Chain is only for accountability and disputes

Using the Claude Review Bot

@claude - Mention in any comment to ask questions or request help
Assign to claude[bot] - Assign an issue to have Claude analyze and propose solutions
Label with claude - Add the claude label to an issue for Claude to investigate

FilesExpand file tree

CLAUDE.md

Latest commit

History