Reapply column-paged merge batcher (#36627) with buffer recycling + resident passthrough#36841
Closed
antiguru wants to merge 3 commits into
Closed
Reapply column-paged merge batcher (#36627) with buffer recycling + resident passthrough#36841antiguru wants to merge 3 commits into
antiguru wants to merge 3 commits into
Conversation
…d merge fork The column-paged `ColumnMergeBatcher` (#36627), used by the linear-join and arrange paths via `Col2ValPagedBatcher`, reallocated a fresh `Column` for every shipped chunk and merged element-by-element even for non-overlapping chains — unlike the non-paged `ColumnMerger`, which recycles buffers through a `stash` and has a whole-chunk passthrough. Bring the fork to parity: * Add a small, capped free-list (`STASH_CAP`, with a `MAX_RECYCLE_BYTES` per-buffer guard) of cleared `Column::Typed` buffers on the batcher, threaded through `merge_chains` / `drain_side` / `extract_chain` / `merge_by` / `seal`. `result`, `keep`/`ship`, and exhausted/consumed chunks are recycled rather than dropped + reallocated. The cap keeps the pool from inflating resident memory on the no-spill path; buffers paged out to a backend are never recycled (no resident allocation to keep). * Add the whole-chunk passthrough: heads arrive already materialized from `FetchIter`, so peeking endpoints is free — ship a head wholesale when it sorts entirely before the other side, skipping the per-record merge. This roughly halves the wall-clock of the spilling join-hydration path. Merge and extract correctness tests pass. https://claude.ai/code/session_01CrNqefrHBNbfJetgjRHsmH
dcdb298 to
b0a46e1
Compare
…nal clusterd The join (`v2`) previously ran on an in-process managed `join_cluster`, so its arrangement memory landed in the `materialized` process (`memory_mz`) and the topology didn't match a real deployment. Run it on a dedicated external `clusterd_join` (16 workers) instead: * `mzcompose.py`: declare a second `Clusterd(name="clusterd_join", workers=16)`, started only when a scenario opts in. `create_clusterd_service` gains `name`/`workers`; `start_overridden_*` starts/kills the extra clusterd. * Per-scenario measured container: new backward-compatible `Scenario.MEMORY_CLUSTERD_SERVICE` (default `"clusterd"`), threaded through `Benchmark` into `Docker.DockerMemClusterd`, so `memory_clusterd` tracks the busy join clusterd for these scenarios and is unchanged for all others. * `DifferentialJoinHydration`: create `join_cluster` as an unmanaged replica pinned to `clusterd_join`, keep `v1` on the default cluster, and trigger hydration via `DROP`/`CREATE CLUSTER REPLICA` (unmanaged) instead of `ALTER CLUSTER … REPLICATION FACTOR`. Lives on the #36627 re-land branch alongside the merge-batcher fix. Needs a benchmark run to validate (can't exercise Docker here). https://claude.ai/code/session_01CrNqefrHBNbfJetgjRHsmH
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Re-lands the column-paged merge batcher reverted in #36839 (revert of #36627), and adds two follow-ups on top.
Commits
column_pager, theDifferentialJoinHydration*scenarios, and theenable_column_paged_batcherdyncfg exactly as Column paged merge batcher #36627 had them.ColumnMergeBatcherto parity with the non-pagedColumnMerger: a small capped (STASH_CAP+MAX_RECYCLE_BYTES) free-list of clearedColumn::Typedbuffers threaded throughmerge_chains/drain_side/extract_chain/merge_by/seal, plus whole-chunk passthrough for materialized heads. Roughly halves the spilling join-hydration wall-clock; the free-list is capped so it doesn't inflate resident memory on the no-spill path.DifferentialJoinHydrationon a dedicated external clusterd — the join (v2) now runs on a separateclusterd_join(16 workers) wired as an unmanaged replica, instead of an in-process managed cluster, so its arrangement memory is captured bymemory_clusterdand the topology matches a real deployment:mzcompose.pydeclares/startsclusterd_joinonly when a scenario opts in;Scenario.MEMORY_CLUSTERD_SERVICE(default"clusterd") threads through toDocker.DockerMemClusterd, so only these scenarios measureclusterd_join;DROP/CREATE CLUSTER REPLICA(unmanaged) instead ofREPLICATION FACTOR;v1stays on the default cluster.Notes
mz-timely-utilbuilds;merge_chains/extract_chainunit tests pass; Python (py_compile+ruff) is clean.https://claude.ai/code/session_01CrNqefrHBNbfJetgjRHsmH