From 092f34339cd23bd95fec5ead1f6708c67d921b8a Mon Sep 17 00:00:00 2001 From: Doyeon Kim Date: Mon, 15 Jun 2026 18:43:31 +0900 Subject: [PATCH 1/2] Improve incremental repair documentation clarity Patch by dybyte; reviewed by TBD for CASSANDRA-21331 --- .../pages/managing/operating/auto_repair.adoc | 18 +++++++++++++++--- .../operating/compaction/overview.adoc | 1 + 2 files changed, 16 insertions(+), 3 deletions(-) diff --git a/doc/modules/cassandra/pages/managing/operating/auto_repair.adoc b/doc/modules/cassandra/pages/managing/operating/auto_repair.adoc index e989c49d2a9e..9f1ea34006f1 100644 --- a/doc/modules/cassandra/pages/managing/operating/auto_repair.adoc +++ b/doc/modules/cassandra/pages/managing/operating/auto_repair.adoc @@ -64,10 +64,16 @@ a smaller set of data, so a shorter `min_repair_interval` such as `1h` is recomm One should be careful when enabling incremental repair on a cluster for the first time. While xref:#repair-token-range-splitter[RepairTokenRangeSplitter] includes a default configuration to attempt to gracefully migrate to incremental repair over time, failure to take proper precaution could overwhelm the cluster with -xref:managing/operating/compaction/overview.adoc#types-of-compaction[anticompactions]. +xref:managing/operating/compaction/overview.adoc#anticompaction[anticompactions]. + +This happens because incremental repair must move the repaired ranges out of the SSTables that were unrepaired when +the repair started. After the repair finishes streaming differences, Cassandra runs +xref:managing/operating/compaction/overview.adoc#anticompaction[anticompaction] to rewrite the participating SSTables +and split the repaired data from the still-unrepaired data. On a cluster with large SSTables or many overlapping +partitions, that rewrite can touch a large amount of data and create substantial extra disk and I/O load. No matter how one goes about enabling and running incremental repair, it is recommended to run a cycle of full repairs -for the entire cluster as pre-flight step to running incremental repair. This will put the cluster into a more +for the entire cluster as a pre-flight step to running incremental repair. This will put the cluster into a more consistent state which will reduce the amount of streaming between replicas when incremental repair initially runs. If you do not have strong data consistency requirements, one may consider using @@ -80,13 +86,19 @@ If you do have strong data consistency requirements, then one must treat all dat incremental repair against it. Consult xref:#incremental-repair-defaults[RepairTokenRangeSplitter's Incremental repair defaults]. +The first incremental repair on an existing cluster still has to compare and repair the entire unrepaired data set, so +its repair scope can look similar to a full repair. The difference is what happens afterward: a full repair leaves the +data in the unrepaired set, while an incremental repair also marks the repaired ranges and triggers +xref:managing/operating/compaction/overview.adoc#anticompaction[anticompaction] so that later incremental repairs can +focus only on newly written unrepaired data. + In particular one should be mindful of the xref:managing/operating/compaction/overview.adoc[compaction strategy] you use for your tables and how it might impact incremental repair before running incremental repair for the first time: - *Large SSTables*: When using xref:managing/operating/compaction/stcs.adoc[SizeTieredCompactionStrategy] or any compaction strategy which can create large SSTables including many partitions the amount of - xref:managing/operating/compaction/overview.adoc#types-of-compaction[anticompaction] that might be required could be + xref:managing/operating/compaction/overview.adoc#anticompaction[anticompaction] that might be required could be excessive. Using a small `bytes_per_assignment` might contribute to repeated anticompactions over the same unrepaired data. - *Partitions overlapping many SSTables*: If partitions overlap between many SSTables, the amount of SSTables included diff --git a/doc/modules/cassandra/pages/managing/operating/compaction/overview.adoc b/doc/modules/cassandra/pages/managing/operating/compaction/overview.adoc index 6a396106b8ee..b9ca288a5706 100644 --- a/doc/modules/cassandra/pages/managing/operating/compaction/overview.adoc +++ b/doc/modules/cassandra/pages/managing/operating/compaction/overview.adoc @@ -67,6 +67,7 @@ Compaction executes to remove any ranges that a node no longer owns. This type of compaction is typically triggered on neighbouring nodes after a node has been bootstrapped, since the bootstrapping node will take ownership of some ranges from those nodes. Secondary index rebuild:: A compaction is triggered if the secondary indexes are rebuilt on a node. +[#anticompaction] Anticompaction:: After repair, the ranges that were actually repaired are split out of the SSTables that existed when repair started. This type of compaction rewrites SSTables to accomplish this task. Sub range compaction:: From bdec1ece40cec68b6e2251094d65c0cf178a8f6f Mon Sep 17 00:00:00 2001 From: Doyeon Kim Date: Mon, 15 Jun 2026 19:00:53 +0900 Subject: [PATCH 2/2] Refine incremental repair documentation wording Patch by dybyte; reviewed by TBD for CASSANDRA-21331 --- .../pages/managing/operating/auto_repair.adoc | 17 ++++++++--------- 1 file changed, 8 insertions(+), 9 deletions(-) diff --git a/doc/modules/cassandra/pages/managing/operating/auto_repair.adoc b/doc/modules/cassandra/pages/managing/operating/auto_repair.adoc index 9f1ea34006f1..d0152599edf1 100644 --- a/doc/modules/cassandra/pages/managing/operating/auto_repair.adoc +++ b/doc/modules/cassandra/pages/managing/operating/auto_repair.adoc @@ -66,11 +66,11 @@ xref:#repair-token-range-splitter[RepairTokenRangeSplitter] includes a default c migrate to incremental repair over time, failure to take proper precaution could overwhelm the cluster with xref:managing/operating/compaction/overview.adoc#anticompaction[anticompactions]. -This happens because incremental repair must move the repaired ranges out of the SSTables that were unrepaired when -the repair started. After the repair finishes streaming differences, Cassandra runs -xref:managing/operating/compaction/overview.adoc#anticompaction[anticompaction] to rewrite the participating SSTables -and split the repaired data from the still-unrepaired data. On a cluster with large SSTables or many overlapping -partitions, that rewrite can touch a large amount of data and create substantial extra disk and I/O load. +This happens because incremental repair must separate the ranges participating in repair from the rest of the +unrepaired SSTables. Cassandra uses xref:managing/operating/compaction/overview.adoc#anticompaction[anticompaction] to +rewrite the participating SSTables, separating data that belongs to the repair session from data that remains +unrepaired. On a cluster with large SSTables or many overlapping partitions, that rewrite can touch a large amount of +data and create substantial extra disk and I/O load. No matter how one goes about enabling and running incremental repair, it is recommended to run a cycle of full repairs for the entire cluster as a pre-flight step to running incremental repair. This will put the cluster into a more @@ -87,10 +87,9 @@ incremental repair against it. Consult xref:#incremental-repair-defaults[RepairTokenRangeSplitter's Incremental repair defaults]. The first incremental repair on an existing cluster still has to compare and repair the entire unrepaired data set, so -its repair scope can look similar to a full repair. The difference is what happens afterward: a full repair leaves the -data in the unrepaired set, while an incremental repair also marks the repaired ranges and triggers -xref:managing/operating/compaction/overview.adoc#anticompaction[anticompaction] so that later incremental repairs can -focus only on newly written unrepaired data. +its repair scope can look similar to a full repair. The difference is what happens afterward: a full repair does not +update the incremental repair state, while an incremental repair records the repaired ranges so that later incremental +repairs can focus only on newly written unrepaired data. In particular one should be mindful of the xref:managing/operating/compaction/overview.adoc[compaction strategy] you use for your tables and how it might impact incremental repair before running incremental repair for the first