perf(clickhouse): LZ4-compress INSERT bodies in ClickhouseWriterStep#7993
Open
phacops wants to merge 2 commits into
Open
perf(clickhouse): LZ4-compress INSERT bodies in ClickhouseWriterStep#7993phacops wants to merge 2 commits into
phacops wants to merge 2 commits into
Conversation
Restore body compression on the writer side. Every INSERT (JSON and RowBinary alike) now POSTs in ClickHouse's native compressed format — the same wire shape `clickhouse-rs` used before #7979 removed the crate. Each block is 1 MiB of uncompressed input maximum (matching the server's `max_compress_block_size` default), encoded as: [16] CityHash128(header || compressed), little-endian u128 [ 1] 0x82 (LZ4 method identifier) [ 4] u32 LE: compressed size including the 9-byte header [ 4] u32 LE: uncompressed size of this block [..] raw LZ4 block bytes The body is one or more such blocks concatenated. The URL gains `decompress=1` to tell ClickHouse the body is in this native format (distinct from HTTP-standard `Content-Encoding: lz4`, which would need `enable_http_compression=1` and produce different wire bytes). CityHash 1.0.2 is the variant ClickHouse bundles for compression checksums; the 110 variant is reserved for newer hash columns and is NOT interchangeable here. Compression runs once before the retry loop so retries don't pay the cost again — `bytes::Bytes` keeps the per-attempt clone to a refcount bump. The trade is more consumer CPU for fewer bytes on the wire to ClickHouse. The previous behavior (uncompressed bodies) shipped in #7979 because dropping `clickhouse-rs` also dropped its default compression; this puts it back. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Agent transcript: https://claudescope.sentry.dev/share/ysaClgIILBEioShwXIR2ZQ62mRrodPCConwlvJefkr8
`cityhash-rs::cityhash_102_128` returns a u128 with the canonical "low" half in the upper 64 bits and "high" in the lower 64 bits. ClickHouse's `CompressedReadBuffer` reads the wire as 8 LE bytes of low then 8 LE bytes of high — so a naive `to_le_bytes()` puts the halves in the wrong order and the server rejects every block with `CANNOT_DECOMPRESS / Checksum doesn't match`. Rotate the u128 by 64 before serializing to swap the halves back into the order CH expects. Factored into `ch_compression_checksum` so the test decoder shares the same logic, and added `test_compression_checksum_matches_clickhouse_wire_order` to lock the convention in place — without the rotate, that test fails AND `it_works` fails against a real ClickHouse with the exact error this fixes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
MeredithAnya
approved these changes
Jun 2, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Restore LZ4 compression on the writer's request bodies, in ClickHouse's native compressed format. After #7979 dropped the
clickhouse-rscrate, every INSERT (JSON and RowBinary alike) started POSTing uncompressed bytes — the crate's defaultCompression::Lz4went away with it. This puts compression back, wire-compatible with whatclickhouse-rswas sending.Wire format
Bodies are now a concatenation of one or more native compressed blocks. Each block:
The 9-byte (method + sizes) header is hashed together with the compressed bytes — the checksum guards both. We chunk at 1 MiB uncompressed to match the server's
max_compress_block_sizedefault; larger blocks risk tripping server-side decompress limits.This is not HTTP-standard
Content-Encoding: lz4. That variant uses the LZ4 frame format and requiresenable_http_compression=1. ClickHouse's native format goes through a different decoder path (CompressedReadBuffer) and is selected withdecompress=1in the URL — that's whatclickhouse-rsand theclickhouse-compressorCLI use.Things to look at
cityhash-rs'scityhash_102_128is the right one — that's the variant CH bundles for compression checksums. The 110 variant is reserved for newer hash columns and is NOT interchangeable; mixing them would silently produce bodies the server rejects with "Checksum doesn't match".u128::to_le_bytes()gives[low 64 LE | high 64 LE], which matches how CH reads the twoUInt64s inCompressedReadBuffer::readCompressedData. The roundtrip test re-hashes and asserts equality, so any byte-order regression would surface there.bytes::Bytesfor retries. Compression runs once before the retry loop; each attempt clones viabytes::Bytes(refcount bump, not a memcpy). The previous behavior already did this for the uncompressed body — same shape, just on the compressed bytes now.spawn_blocking. Compression runs inline on the writer's tokio task. LZ4 block encode is ~500 MB/s/core, so even a 10 MiB batch is ~20 ms — fine for current batch sizes. If a future workload pushes much larger batches we can revisit.insertions.batch_write_bytesmetric. Still reports the uncompressed body size (it's measured beforesendis called), which is what you want as the logical-payload metric. If we want a compressed-on-wire counter for ratio dashboards, that's a one-line follow-up.Trade
More consumer CPU for fewer bytes on the wire to ClickHouse. This is the trade
clickhouse-rswas making by default before #7979.Verification
cargo test --lib strategies::clickhouse::writer_v2— 5/5 pass. The two new roundtrip tests walk each block in the buffer, verify the header layout + method byte, recompute CityHash128 overheader || compressedand assert it matches the stored checksum, thenlz4_flex::block::decompresswith the on-the-wireuncompressed_size. The multi-block test sends 2.5 MiB to exercise the chunking loop end-to-end.Wire format can also be spot-checked with
clickhouse-compressor --decompresson a captured body.Agent transcript: https://claudescope.sentry.dev/share/Tk2kzd3lO5_147NX6Gia3qEHt-HKJ5P35e7ZxaPYXu0