Incremental Repair: Problems and a Solution
Problems with incremental repair:
1. Overstreaming, especially when using LCS
At the time of anticompaction during incremental repair, compaction can happen on the SSTable involved in the repair. It can be compacted away on one node but may not be on other nodes. This creates inconsistency in the repaired and unrepaired stamp for that particular SSTable across nodes. The SSTable will be marked as repaired on other nodes but not on that particular node. This means in the next incremental run it can generate a large amount of overstreaming. This bug, reported on CASSANDRA-9143, badly affects tables using LCS strategy. Leveled compaction creates SSTables of a fixed, relatively small size (5MB by default in Cassandra's implementation), which are grouped into "levels." During repair, LCS tables can create tens of thousands of small SSTables in L0 which can affect the entire cluster and may even bring the node down.2. Significant increase in disk usage because of anticompaction
Anticompaction will rewrite all SSTables on a disk to separate repaired and unrepaired data. The incremental repair can take a lot of time in the beginning and create a lot of SSTables due to anticompaction which can lead to high disk utilization.3. Tombstones
Incremental repair-affected SSTables are marked as repaired. In subsequent compactions, these tables will be compacted separately from SSTables that have not been repaired. If tombstones are in unrepaired SSTables, and the shadowed data is in repaired SSTables (or vice versa), the data cannot be dropped because Cassandra will not compact repaired and unrepaired SSTables together. Tombstones can lead to additional problems like degraded read performance.Finding the presence of incremental repairs
If incremental repairs are or were ever turned on, the data could be in an SSTtable having a different status ( repaired, unrepaired).- In versions 2.2+, incremental repair is on by default. You'll need to use the --full flag with repairs to avoid it. However, in the latest version, full repair also performs anticompaction so the problem remains even when incremental repairs are off.
- To check whether an existing SSTable has been incrementally repaired, use the sstablemetadata tool and view the "Repaired at:" line. 0 means the SSTable has never been incrementally repaired; any other value means it has been incrementally repaired.
Procedure to revert incremental repairs
You should execute these steps on all nodes, but one node at a time. You must stop Cassandra on that node. 1. Take a snapshot first for the keyspace / table for which you are reverting incremental repairs.nodetool -u <user> -pw <password> snapshot <Keyspace>2. Flush and drain the data before stopping the node so there is no in-memory data left.
nodetool -u <user> -pw <password> flush nodetool -u <user> -pw <password> drain3. Stop Cassandra on the node.
nodetool -u <user> -pw <password> stopdaemon4. Use sstablerepairedset to mark all SSTables as unrepaired and start Cassandra.
find <data directory path/<keyspace> -iname "Data.db" > find_data_paths.txt sudo <cassandra installation directory>/tools/bin/sstablerepairedset --really-set --is-unrepaired -f find_data_paths.txt sudo runuser -l cassandra -c <cassandra installation directory>/bin/cassandra5. Check for any error / warning related to the procedure.
grep 'ERROR|WARN' <debug file path>/debug.log6. Run tablestats and check for percent repaired. The value should be 0.
nodetool -u <user> -pw <password> tablestats <Keyspace> | grep PercentIt's important to remember that Cassandra will never compact repaired and unrepaired SSTables together. If you stop performing incremental repairs once started, then data on the disk can become outdated. Subrange repair is the only option to avoid anticompaction and other problems created by incremental repair, as running a repair with --full also triggers it. You can run subrange repair with the help of Reaper or a subrange repair script. We hope that future releases will make incremental repair better and provide more advantages.
On this page
Share this
Share this
More resources
Learn more about Pythian by reading the following blogs and articles.
An Introduction to Chaos Engineering: Trying to Break Stuff
An Introduction to Chaos Engineering: Trying to Break Stuff
Dec 23, 2019 12:00:00 AM
5
min read
Monitor Cassandra using Zabbix
Monitor Cassandra using Zabbix
Nov 7, 2016 12:00:00 AM
9
min read
How to migrate data from Cassandra to Elassandra in Docker containers

How to migrate data from Cassandra to Elassandra in Docker containers
Jul 3, 2018 12:00:00 AM
3
min read
Ready to unlock value from your data?
With Pythian, you can accomplish your data transformation goals and more.