Skip to content

perf(clickhouse): LZ4-compress INSERT bodies in ClickhouseWriterStep#7993

Open
phacops wants to merge 2 commits into
masterfrom
perf/clickhouse-native-lz4-compression
Open

perf(clickhouse): LZ4-compress INSERT bodies in ClickhouseWriterStep#7993
phacops wants to merge 2 commits into
masterfrom
perf/clickhouse-native-lz4-compression

Conversation

@phacops
Copy link
Copy Markdown
Contributor

@phacops phacops commented Jun 2, 2026

Restore LZ4 compression on the writer's request bodies, in ClickHouse's native compressed format. After #7979 dropped the clickhouse-rs crate, every INSERT (JSON and RowBinary alike) started POSTing uncompressed bytes — the crate's default Compression::Lz4 went away with it. This puts compression back, wire-compatible with what clickhouse-rs was sending.

Wire format

Bodies are now a concatenation of one or more native compressed blocks. Each block:

[16]  CityHash128(header || compressed), little-endian u128
[ 1]  0x82 (LZ4 method identifier)
[ 4]  u32 LE: compressed size INCLUDING the 9-byte header
[ 4]  u32 LE: uncompressed size of this block
[..]  raw LZ4 block bytes (no frame wrapper, no prepended size)

The 9-byte (method + sizes) header is hashed together with the compressed bytes — the checksum guards both. We chunk at 1 MiB uncompressed to match the server's max_compress_block_size default; larger blocks risk tripping server-side decompress limits.

This is not HTTP-standard Content-Encoding: lz4. That variant uses the LZ4 frame format and requires enable_http_compression=1. ClickHouse's native format goes through a different decoder path (CompressedReadBuffer) and is selected with decompress=1 in the URL — that's what clickhouse-rs and the clickhouse-compressor CLI use.

Things to look at

  • CityHash variant. cityhash-rs's cityhash_102_128 is the right one — that's the variant CH bundles for compression checksums. The 110 variant is reserved for newer hash columns and is NOT interchangeable; mixing them would silently produce bodies the server rejects with "Checksum doesn't match".
  • Byte order of the checksum. u128::to_le_bytes() gives [low 64 LE | high 64 LE], which matches how CH reads the two UInt64s in CompressedReadBuffer::readCompressedData. The roundtrip test re-hashes and asserts equality, so any byte-order regression would surface there.
  • bytes::Bytes for retries. Compression runs once before the retry loop; each attempt clones via bytes::Bytes (refcount bump, not a memcpy). The previous behavior already did this for the uncompressed body — same shape, just on the compressed bytes now.
  • No spawn_blocking. Compression runs inline on the writer's tokio task. LZ4 block encode is ~500 MB/s/core, so even a 10 MiB batch is ~20 ms — fine for current batch sizes. If a future workload pushes much larger batches we can revisit.
  • insertions.batch_write_bytes metric. Still reports the uncompressed body size (it's measured before send is called), which is what you want as the logical-payload metric. If we want a compressed-on-wire counter for ratio dashboards, that's a one-line follow-up.

Trade

More consumer CPU for fewer bytes on the wire to ClickHouse. This is the trade clickhouse-rs was making by default before #7979.

Verification

cargo test --lib strategies::clickhouse::writer_v2 — 5/5 pass. The two new roundtrip tests walk each block in the buffer, verify the header layout + method byte, recompute CityHash128 over header || compressed and assert it matches the stored checksum, then lz4_flex::block::decompress with the on-the-wire uncompressed_size. The multi-block test sends 2.5 MiB to exercise the chunking loop end-to-end.

Wire format can also be spot-checked with clickhouse-compressor --decompress on a captured body.

Agent transcript: https://claudescope.sentry.dev/share/Tk2kzd3lO5_147NX6Gia3qEHt-HKJ5P35e7ZxaPYXu0

Restore body compression on the writer side. Every INSERT (JSON and
RowBinary alike) now POSTs in ClickHouse's native compressed format —
the same wire shape `clickhouse-rs` used before #7979 removed the
crate.

Each block is 1 MiB of uncompressed input maximum (matching the server's
`max_compress_block_size` default), encoded as:

  [16] CityHash128(header || compressed), little-endian u128
  [ 1] 0x82 (LZ4 method identifier)
  [ 4] u32 LE: compressed size including the 9-byte header
  [ 4] u32 LE: uncompressed size of this block
  [..] raw LZ4 block bytes

The body is one or more such blocks concatenated. The URL gains
`decompress=1` to tell ClickHouse the body is in this native format
(distinct from HTTP-standard `Content-Encoding: lz4`, which would need
`enable_http_compression=1` and produce different wire bytes).

CityHash 1.0.2 is the variant ClickHouse bundles for compression
checksums; the 110 variant is reserved for newer hash columns and is
NOT interchangeable here. Compression runs once before the retry loop
so retries don't pay the cost again — `bytes::Bytes` keeps the
per-attempt clone to a refcount bump.

The trade is more consumer CPU for fewer bytes on the wire to
ClickHouse. The previous behavior (uncompressed bodies) shipped in
#7979 because dropping `clickhouse-rs` also dropped its default
compression; this puts it back.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Agent transcript: https://claudescope.sentry.dev/share/ysaClgIILBEioShwXIR2ZQ62mRrodPCConwlvJefkr8
@phacops phacops requested a review from a team as a code owner June 2, 2026 21:43
`cityhash-rs::cityhash_102_128` returns a u128 with the canonical "low"
half in the upper 64 bits and "high" in the lower 64 bits. ClickHouse's
`CompressedReadBuffer` reads the wire as 8 LE bytes of low then 8 LE
bytes of high — so a naive `to_le_bytes()` puts the halves in the wrong
order and the server rejects every block with `CANNOT_DECOMPRESS /
Checksum doesn't match`.

Rotate the u128 by 64 before serializing to swap the halves back into
the order CH expects. Factored into `ch_compression_checksum` so the
test decoder shares the same logic, and added
`test_compression_checksum_matches_clickhouse_wire_order` to lock the
convention in place — without the rotate, that test fails AND `it_works`
fails against a real ClickHouse with the exact error this fixes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants