Skip to content

BAL storage prefetch poisons the warm execution cache for EIP-6780 destroy-then-recreate accounts (latent consensus/state-root split on BAL-enabled networks) #24300

@researchzero-sec

Description

@researchzero-sec

Describe the bug

Scope / severity up front

  • No Ethereum mainnet impact today — EIP-7928 (Block-Level Access Lists) is a future fork (Amsterdam, tracking Tracking: Amsterdam Hardfork #18783); the defective path only runs when a block carries a BAL.
  • Latent consensus / state-root divergence on any BAL-enabled network (BAL devnets/testnets now, mainnet after activation), with default config (--disable-bal-batch-io not set).
  • The triggering code is merged and on the actively-developed parallel-execution path (feat: parallel execution #23924), so this is cheap to fix now and expensive after the fork ships.
  • Filing publicly because there is no live exploit (no mainnet reachability), only a pre-fork correctness bug.

Summary

When a block carries an EIP-7928 Block-Level Access List, the prewarm BAL storage-prefetch task seeds the shared execution cache with (address, slot) → 0 for every BAL-declared slot, with no filter on whether the address has code in the parent state. For an address X that is created within the block (hence codeless at the parent), the parent-DB read misses and a zero is cached.

If, in that same block, X is destroyed and then recreated (legal post-EIP-6780: a same-tx CREATE2+SELFDESTRUCT is a full delete, a later tx CREATE2s X again), X's post-state bundle status is was_destroyed()-true with no parent code, so insert_state takes the codeless-destroyed branch: it removes only the account entry and skips all storage, never overwriting or evicting the stale (X, slot) → 0. Because the prefetch provider, insert_state, and the next block's warm cache all share the same Arc, the poisoned zero survives into block N+1's warm cache. A non-prewarm read in block N+1 then returns 0 instead of the canonical post-state value, with no DB consultation → divergent SLOAD → divergent state root → consensus split versus a node running with --disable-bal-batch-io (or any other client).

Affected code (paths at b61b5436b, reth v2.2.0)

  1. crates/engine/tree/src/tree/payload_processor/prewarm.rs:754-808prefetch_bal_storage: iterates every BAL account; only gate is disable_bal_batch_io / empty slot lists. No parent-code-presence check.
  2. crates/engine/tree/src/tree/payload_processor/prewarm.rs:397-419 — per-account dispatch; par_iter().for_each over all BAL accounts, no code filter.
  3. crates/engine/execution-cache/src/cached_state.rs:474-491 + 759-776 — PREWARM storage() path: get_or_try_insert_storage_with inserts the closure result; the closure is state_provider.storage(...).map(Option::unwrap_or_default), so a parent-DB miss inserts U256::ZERO. The .filter(|v| !v.is_zero()) only affects the returned value, not the stored cache entry.
  4. crates/engine/execution-cache/src/cached_state.rs:863-882insert_state destroyed-account branch: account.was_destroyed() && !had_codeself.0.account_cache.remove(addr); continue;. Storage cache for addr is never touched; only the had_code == true branch does self.clear(). There is no per-address storage invalidation.
  5. crates/engine/execution-cache/src/cached_state.rs:662-663#[derive(Clone)] pub struct ExecutionCache(Arc<ExecutionCacheInner>); both prefetch_bal_storage (prewarm.rs:789) and save_cache (prewarm.rs:292) operate via saved_cache.cache().clone(), i.e. the same Arc<ExecutionCacheInner> / same storage_cache.
  6. crates/engine/execution-cache/src/cached_state.rs:492 — block N+1 non-prewarm read returns the cached value with no DB fallback on a hit.

Notably, the ExecutionCache doc comment at cached_state.rs:660-661 states the exact assumption this path breaks:

Since EIP-6780, SELFDESTRUCT only works within the same transaction where the contract was created, so we don't need to handle clearing the storage.

That assumption holds for the destroyed-account branch in isolation, but the separately-added BAL prefetch path caches storage for codeless-at-parent addresses, which the destroyed-account branch then fails to clean up.

Why the poison survives only for destroy-then-recreate

Plain "created and kept alive" accounts are self-healing: insert_state's normal path (cached_state.rs:893-894) overwrites (X, slot) with the post-state value. The stale zero persists only through the was_destroyed() continue branch — i.e. exactly the EIP-6780 destroy-then-recreate-within-one-block pattern, where X is codeless at block start and the BAL declares storage for X. This is attacker-constructable and also reachable by ordinary CREATE2-factory churn.

Block N's own execution is not corrupted: revm serves a freshly-created account's SLOADs as 0 from its in-memory state without consulting the cache. The corruption is strictly cross-block (block N+1, where X exists on-chain and its storage is read through the warm cache).

Construction (deterministic; needs a BAL-enabled harness)

Network with EIP-7928 + Cancun active; defaults (disable_state_cache=false, disable_bal_batch_io=false):

  • Block N, address X absent in the parent state:
    • tx i: CREATE2 deploys X; X does SSTORE(s, v); X SELFDESTRUCTs (same tx ⇒ EIP-6780 full delete).
    • tx j > i: CREATE2 redeploys X (same salt/initcode); X does SSTORE(s, 99).
    • Block N's BAL lists X with a storage entry for s.
  • Assert after block N's insert_state: storage_cache.get((X, s)) == Some(0) (poisoned), not Some(99) and not absent.
  • Block N+1: any tx that SLOAD(X, s). The EVM observes 0 (cache hit) while canonical DB has 99 ⇒ divergent state root versus a node run with --disable-bal-batch-io (or another client).

An extension of the existing self-destruct e2e (crates/ethereum/node/tests/e2e/selfdestruct.rs) with a BAL + 2-block variant is the natural home for a regression test.

Suggested fix direction

Either (preferred, minimal, local):

  • (a) Gate prefetch_bal_storage to skip addresses with no code in the parent state, making the codeless ⇒ no-cached-storage invariant explicit at the prefetch site; or
  • (b) Make insert_state's codeless-destroyed branch evict X's storage slots instead of continue (today it cannot enumerate slots cheaply — that is the original reason for the skip; a per-address storage generation/epoch tag would let it invalidate).

Add a regression asserting "no storage-cache entry for an address that was destroyed-then-recreated within the block" at the insert_state / prefetch_bal_storage call sites.

Honest caveats

  • No runnable PoC yet (requires a BAL test harness); the analysis is from code tracing, fully cross-checked against current main.
  • Load-bearing assumption: revm's bundle aggregation yields a was_destroyed()-true status (DestroyedChanged) with no parent code (had_code == false) for the in-block destroy-then-recreate pattern. This is consistent with revm's status model and is the very reason reth special-cases was_destroyed() here, but it should be pinned by the suggested regression test rather than assumed.
  • FixedCache capacity/epoch eviction can probabilistically drop the stale entry before block N+1. That does not make it safe — it makes the consensus split nondeterministic (harder to detect, worse to debug). Hot slots / low-traffic chains retain it.
  • Current mitigation: --disable-bal-batch-io=true disables the prefetch (the poisoning source) at the cost of its prefetch performance benefit.

Steps to reproduce

N/A

Node logs


Platform(s)

Mac (Intel)

Container Type

Not running in a container

What version/commit are you on?

Reth Version: 2.2.0
Commit SHA: b61b543
Build Timestamp: 2026-05-15T19:04:53.035341000Z
Build Features: asm_keccak,jemalloc,keccak_cache_global,min_trace_logs,otlp,otlp_logs
Build Profile: release

What database version are you on?

N/A

Which chain / network are you on?

N/A

What type of node are you running?

Archive (default)

What prune config do you use, if any?

No response

If you've built Reth from source, provide the full command you used

No response

Code of Conduct

  • I agree to follow the Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    C-bugAn unexpected or incorrect behaviorS-needs-triageThis issue needs to be labelled

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status
    Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions