Skip to content

Reapply column-paged merge batcher (#36627) with buffer recycling + resident passthrough#36841

Closed
antiguru wants to merge 3 commits into
mainfrom
claude/paged-batcher-recycle
Closed

Reapply column-paged merge batcher (#36627) with buffer recycling + resident passthrough#36841
antiguru wants to merge 3 commits into
mainfrom
claude/paged-batcher-recycle

Conversation

@antiguru
Copy link
Copy Markdown
Member

@antiguru antiguru commented Jun 1, 2026

Re-lands the column-paged merge batcher reverted in #36839 (revert of #36627), and adds two follow-ups on top.

Commits

  1. Reapply "Column paged merge batcher (Column paged merge batcher #36627)" — restores the paged batcher, column_pager, the DifferentialJoinHydration* scenarios, and the enable_column_paged_batcher dyncfg exactly as Column paged merge batcher #36627 had them.
  2. Recycle buffers + resident passthrough in the paged merge fork — brings ColumnMergeBatcher to parity with the non-paged ColumnMerger: a small capped (STASH_CAP + MAX_RECYCLE_BYTES) free-list of cleared Column::Typed buffers threaded through merge_chains/drain_side/extract_chain/merge_by/seal, plus whole-chunk passthrough for materialized heads. Roughly halves the spilling join-hydration wall-clock; the free-list is capped so it doesn't inflate resident memory on the no-spill path.
  3. Run DifferentialJoinHydration on a dedicated external clusterd — the join (v2) now runs on a separate clusterd_join (16 workers) wired as an unmanaged replica, instead of an in-process managed cluster, so its arrangement memory is captured by memory_clusterd and the topology matches a real deployment:
    • mzcompose.py declares/starts clusterd_join only when a scenario opts in;
    • new backward-compatible Scenario.MEMORY_CLUSTERD_SERVICE (default "clusterd") threads through to Docker.DockerMemClusterd, so only these scenarios measure clusterd_join;
    • hydration is triggered via DROP/CREATE CLUSTER REPLICA (unmanaged) instead of REPLICATION FACTOR; v1 stays on the default cluster.

Notes

  • Independent of the columnar/timely/differential version bump (Update columnar, timely, and differential dependencies #36804).
  • mz-timely-util builds; merge_chains/extract_chain unit tests pass; Python (py_compile + ruff) is clean.
  • The recycling speedup and the external-clusterd topology both need a feature-benchmark run to validate (couldn't exercise Docker / SCALE=8 here).

https://claude.ai/code/session_01CrNqefrHBNbfJetgjRHsmH

@antiguru antiguru requested a review from a team as a code owner June 1, 2026 06:16
claude added 2 commits June 1, 2026 06:28
…d merge fork

The column-paged `ColumnMergeBatcher` (#36627), used by the linear-join and
arrange paths via `Col2ValPagedBatcher`, reallocated a fresh `Column` for
every shipped chunk and merged element-by-element even for non-overlapping
chains — unlike the non-paged `ColumnMerger`, which recycles buffers through a
`stash` and has a whole-chunk passthrough. Bring the fork to parity:

* Add a small, capped free-list (`STASH_CAP`, with a `MAX_RECYCLE_BYTES`
  per-buffer guard) of cleared `Column::Typed` buffers on the batcher, threaded
  through `merge_chains` / `drain_side` / `extract_chain` / `merge_by` / `seal`.
  `result`, `keep`/`ship`, and exhausted/consumed chunks are recycled rather
  than dropped + reallocated. The cap keeps the pool from inflating resident
  memory on the no-spill path; buffers paged out to a backend are never
  recycled (no resident allocation to keep).
* Add the whole-chunk passthrough: heads arrive already materialized from
  `FetchIter`, so peeking endpoints is free — ship a head wholesale when it
  sorts entirely before the other side, skipping the per-record merge.

This roughly halves the wall-clock of the spilling join-hydration path. Merge
and extract correctness tests pass.

https://claude.ai/code/session_01CrNqefrHBNbfJetgjRHsmH
@antiguru antiguru force-pushed the claude/paged-batcher-recycle branch from dcdb298 to b0a46e1 Compare June 1, 2026 06:32
@antiguru antiguru requested a review from a team as a code owner June 1, 2026 06:32
@antiguru antiguru changed the title timely-util: recycle buffers + resident passthrough in the paged merge fork Reapply column-paged merge batcher (#36627) with buffer recycling + resident passthrough Jun 1, 2026
…nal clusterd

The join (`v2`) previously ran on an in-process managed `join_cluster`, so its
arrangement memory landed in the `materialized` process (`memory_mz`) and the
topology didn't match a real deployment. Run it on a dedicated external
`clusterd_join` (16 workers) instead:

* `mzcompose.py`: declare a second `Clusterd(name="clusterd_join", workers=16)`,
  started only when a scenario opts in. `create_clusterd_service` gains
  `name`/`workers`; `start_overridden_*` starts/kills the extra clusterd.
* Per-scenario measured container: new backward-compatible
  `Scenario.MEMORY_CLUSTERD_SERVICE` (default `"clusterd"`), threaded through
  `Benchmark` into `Docker.DockerMemClusterd`, so `memory_clusterd` tracks the
  busy join clusterd for these scenarios and is unchanged for all others.
* `DifferentialJoinHydration`: create `join_cluster` as an unmanaged replica
  pinned to `clusterd_join`, keep `v1` on the default cluster, and trigger
  hydration via `DROP`/`CREATE CLUSTER REPLICA` (unmanaged) instead of
  `ALTER CLUSTER … REPLICATION FACTOR`.

Lives on the #36627 re-land branch alongside the merge-batcher fix. Needs a
benchmark run to validate (can't exercise Docker here).

https://claude.ai/code/session_01CrNqefrHBNbfJetgjRHsmH
@antiguru antiguru marked this pull request as draft June 1, 2026 09:21
@antiguru antiguru closed this Jun 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants