`trace_filter` spawns unbounded concurrent block replays causing sync pipeline stall and MDBX long-reader warnings

## Summary

When `trace_filter` (or any trace API covering a block range) is called, reth spawns one `trace_block_until_with_inspector` blocking task **per block in the range**, all running concurrently. Each task holds an MDBX read transaction open for the full EVM replay duration. 

Under concurrent trace load from multiple clients, this creates hundreds of simultaneous MDBX read transactions that:
1. Saturate NVMe I/O bandwidth to 100%
2. Starve the sync pipeline's `MerkleExecute` stage of I/O
3. Cause the node to fall progressively behind chain head
4. Trigger continuous `WARN storage::db::mdbx: A database read transaction has been open for too long` spam

The node becomes self-reinforcing: as it falls behind, it receives more `forkchoice_updated` payloads, which enqueue more sync work — while trace calls continue consuming all available I/O.

The concurrency ceiling is controlled by `--rpc-cache.max-concurrent-db-requests` (default/configured: 1024). With multiple concurrent clients each requesting 100-block ranges, the theoretical concurrent MDBX reader count is `clients × blocks_per_call`, up to the configured ceiling.

Ethereum mainnet is especially severe: each mainnet block trace replay takes 60–120 seconds (vs 5–15 seconds for less complex chains), generating thousands of random IOPS per concurrent task. Just 4–5 simultaneous mainnet trace replays are sufficient to saturate a high-end NVMe RAID array.


---

## Environment

| Field | Value |
|-------|-------|
| **OS** | Ubuntu 25.10, kernel 6.17.0-29-generic |
| **CPU** | AMD EPYC 7J13, 64 cores / 128 threads |
| **RAM** | 500 GiB |
| **Storage device** | 6× Micron 7450 7 TiB NVMe in RAID-0 (md0), XFS |

---

## Reth Configuration (relevant flags)

Docker Container:

```
node
  --chain=mainnet
  --storage.v2
  --db.max-size=4TB
  --db.max-readers=1024
  --http
  --http.api=all
  --rpc-cache.max-concurrent-db-requests=1024
  --rpc.max-trace-filter-blocks=2000
  --rpc.max-response-size=200
  --rpc.gascap=100000000
  --engine.cross-block-cache-size=4096
  --engine.memory-block-buffer-target=128
  --engine.persistence-threshold=4
  --engine.storage-worker-count=48
  --engine.account-worker-count=48
  --engine.prewarming-threads=32
  --rpc-cache.max-blocks=10000
  --rpc-cache.max-receipts=10000
  --rpc-cache.max-headers=5000
  --mem-limit 32g
```
---

## Expected Behavior

1. `trace_filter` over a block range should not starve the sync/consensus pipeline of I/O
2. The node should maintain sync progress regardless of RPC trace load

---

## Additional Notes

- The `WARN storage::db::mdbx: A database read transaction has been open for too long` message itself is **not a bug** — it is a correct diagnostic. The bug is the lack of back-pressure between RPC trace ops and the sync pipeline.
- The node does **not** crash or corrupt data. It eventually completes each `MerkleExecute` run (after 23 minutes in our case) and catches up between trace bursts. The risk is falling so far behind that lighthouse considers the execution client unresponsive.

---

## Steps to reproduce

1. Run reth on Ethereum mainnet as an archive node with `--http.api=all` (trace APIs enabled) and `--rpc-cache.max-concurrent-db-requests` set to a large value (≥ 64).
2. Connect one or more RPC clients that issue `trace_filter` calls over block ranges (e.g., 50–100 blocks per call). The clients should be **catching up** from a point significantly behind chain head so requests are continuous rather than one-per-block.
3. Run multiple such clients concurrently (3–5 simultaneous connections, each with their own `trace_filter` loop).
4. Observe: `WARN storage::db::mdbx: A database read transaction has been open for too long` begins appearing repeatedly; `MerkleExecute` stage checkpoint stops advancing; NVMe utilization reaches 100%.

**Minimum reproduction (single client):**
1. Start reth mainnet archive node with `--rpc-cache.max-concurrent-db-requests=1024`.
2. Send a single `trace_filter` RPC call covering 50+ consecutive mainnet blocks from a historically complex range (e.g., blocks 18000000–18000050, high DeFi activity).
3. While the call is in flight, observe `iostat -x 1` on the NVMe device and `reth` sync status — I/O will spike and pipeline stages will stall.

---

## Node logs

### Continuous WARN spam (135 occurrences in 10 minutes)

```
2026-05-23T08:56:45.061395Z WARN storage::db::mdbx: A database read transaction has been open for too long
  open_duration=60.000748474s id=9274075 backtrace=
   0: reth_db::implementation::mdbx::tx::MetricsHandler<K>::log_backtrace_on_long_read_transaction
             at ./crates/storage/db/src/implementation/mdbx/tx.rs:259:32
   1: reth_db::implementation::mdbx::tx::Tx<K>::execute_with_operation_metric
             at ./crates/storage/db/src/implementation/mdbx/tx.rs:165:29
   2: <reth_db::implementation::mdbx::tx::Tx<K> as reth_db_api::transaction::DbTx>::get_by_encoded_key
             at ./crates/storage/db/src/implementation/mdbx/tx.rs:297:14
   3: <reth_provider::providers::database::provider::DatabaseProvider<TX,N> as reth_storage_api::stage_checkpoint::StageCheckpointReader>::get_stage_checkpoint
             at ./crates/storage/provider/src/providers/database/provider.rs:2236:21
   4: <reth_provider::providers::database::provider::DatabaseProvider<TX,N> as reth_storage_api::block_id::BlockNumReader>::best_block_number
             at ./crates/storage/provider/src/providers/database/provider.rs:1798:14
   5: reth_provider::providers::state::historical::HistoricalStateProviderRef<Provider,N>::storage_history_lookup
             at ./crates/storage/provider/src/providers/state/historical.rs:221:41
   6: reth_provider::providers::state::historical::HistoricalStateProviderRef<Provider,N>::storage_by_lookup_key
             at ./crates/storage/provider/src/providers/state/historical.rs:247:20
   7: <reth_provider::providers::state::historical::HistoricalStateProviderRef<Provider,N> as reth_storage_api::state::StateProvider>::storage
             at ./crates/storage/provider/src/providers/state/historical.rs:671:14
  ...
  11: <T as reth_revm::database::EvmStateProvider>::storage
             at ./crates/revm/src/database.rs:58:9
  12: <reth_revm::database::StateProviderDatabase<DB> as revm_database_interface::DatabaseRef>::storage_ref
             at ./crates/revm/src/database.rs:161:19
  ...
  24: revm_interpreter::instructions::host::sload
             at /usr/local/cargo/registry/.../revm-interpreter-35.0.1/src/instructions/host.rs:205:32
  25: revm_interpreter::instructions::Instruction<W,H>::execute
  26: revm_interpreter::interpreter::Interpreter<IW>::step
  27: revm_inspector::handler::inspect_instructions
  28: revm_inspector::traits::InspectorEvmTr::inspect_frame_run
  ...
  33: revm_inspector::mainnet_inspect::<impl revm_inspector::inspect::InspectEvm for ...>::inspect_one_tx
  34: revm_inspector::inspect::InspectEvm::inspect_tx
  35: <alloy_evm::eth::EthEvm<DB,I,PRECOMPILE> as alloy_evm::evm::Evm>::transact_raw
  36: alloy_evm::evm::Evm::transact
  37: <alloy_evm::tracing::TracerIter<E,Txs,F> as core::iter::traits::iterator::Iterator>::next
  ...
  52: reth_rpc_eth_api::helpers::trace::Trace::trace_block_until_with_inspector::{{closure}}::{{closure}}
  53: reth_rpc_eth_api::helpers::call::Call::spawn_with_state_at_block::{{closure}}::{{closure}}
  54: reth_rpc_eth_api::helpers::blocking_task::SpawnBlocking::spawn_blocking_io_fut::{{closure}}
  ...
  70: reth_tasks::runtime::Runtime::spawn_on_rt::{{closure}}
  71: <tokio::runtime::blocking::task::BlockingTask<T> as core::future::future::Future>::poll
  ...
  88: tokio::runtime::blocking::pool::Spawner::spawn_thread::{{closure}}
```

All 135 WARNs trace to `trace_block_until_with_inspector`. Multiple simultaneous warnings with identical tx IDs confirm concurrent open transactions.

### Sync pipeline stall — 23 minutes on 64 blocks

```
2026-05-23T08:51:49Z INFO reth_node_events::node: Executing stage pipeline_stages=8/13 stage=MerkleExecute checkpoint=25156717 target=25156781
2026-05-23T08:52:12Z INFO reth::cli: Status connected_peers=61 stage=MerkleExecute checkpoint=25156717 target=25156781
2026-05-23T08:53:52Z INFO reth::cli: Status connected_peers=68 stage=MerkleExecute checkpoint=25156717 target=25156781
2026-05-23T08:55:32Z INFO reth::cli: Status connected_peers=69 stage=MerkleExecute checkpoint=25156717 target=25156781
2026-05-23T08:57:12Z INFO reth::cli: Status connected_peers=80 stage=MerkleExecute checkpoint=25156717 target=25156781
2026-05-23T08:58:52Z INFO reth::cli: Status connected_peers=82 stage=MerkleExecute checkpoint=25156717 target=25156781
# ... checkpoint=25156717 unchanged for entire period ...
2026-05-23T09:15:07Z INFO reth::cli: Status connected_peers=130 stage=MerkleExecute checkpoint=25156781 target=25156877
# ^ Finally advanced after 23 minutes; chain was 226 blocks ahead
```

MerkleExecute for 64 blocks normally completes in under 1 second. It took **23 minutes** because all I/O was consumed by concurrent trace replays.

### System resource state during incident

System load average (128-thread machine):
```
465.47 415.66 307.65
```

reth-eth container stats:
```
CONTAINER   CPU %      MEM USAGE / LIMIT   PIDS
reth-eth    4152.78%   23.6GiB / 32GiB     1181
```

NVMe RAID-0 I/O — fully saturated:
```
Device   r/s       rkB/s      r_await  w/s       wkB/s     %util
md0      67083.90  1796190.84  0.22    13448.08  86983.82  91.39
md0      63756.00  2038088.00  0.72    25958.00  105352.00 100.10
md0      61955.45  2131053.47  0.74    6347.52   100479.21 99.01
```
---

## Platform(s)

Linux (x86)

## Container Type

Docker

## What version/commit are you on?

```
Reth Version: 2.2.0
Commit SHA: 88505c7fcbfdebfd3b56d88c86b62e950043c6c4
Build Timestamp: 2026-04-29T19:53:57.473810535Z
Build Features: asm_keccak,jemalloc,keccak_cache_global,min_debug_logs,otlp,otlp_logs
Build Profile: maxperf-symbols
```

## What database version are you on?

```
Current database version: 2
Local database version: 2
```

## Which chain / network are you on?

`--chain mainnet` (Ethereum mainnet, archive node, `--storage.v2`)

## What type of node are you running?

Archive (default)

## Code of Conduct

- [x] I agree to follow the Code of Conduct

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`trace_filter` spawns unbounded concurrent block replays causing sync pipeline stall and MDBX long-reader warnings #24473

Summary

Environment

Reth Configuration (relevant flags)

Expected Behavior

Additional Notes

Steps to reproduce

Node logs

Continuous WARN spam (135 occurrences in 10 minutes)

Sync pipeline stall — 23 minutes on 64 blocks

System resource state during incident

Platform(s)

Container Type

What version/commit are you on?

What database version are you on?

Which chain / network are you on?

What type of node are you running?

Code of Conduct

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Field	Value
OS	Ubuntu 25.10, kernel 6.17.0-29-generic
CPU	AMD EPYC 7J13, 64 cores / 128 threads
RAM	500 GiB
Storage device	6× Micron 7450 7 TiB NVMe in RAID-0 (md0), XFS

trace_filter spawns unbounded concurrent block replays causing sync pipeline stall and MDBX long-reader warnings #24473

Description

Summary

Environment

Reth Configuration (relevant flags)

Expected Behavior

Additional Notes

Steps to reproduce

Node logs

Continuous WARN spam (135 occurrences in 10 minutes)

Sync pipeline stall — 23 minutes on 64 blocks

System resource state during incident

Platform(s)

Container Type

What version/commit are you on?

What database version are you on?

Which chain / network are you on?

What type of node are you running?

Code of Conduct

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

`trace_filter` spawns unbounded concurrent block replays causing sync pipeline stall and MDBX long-reader warnings #24473