Apache Cassandra 2.1 incremental repair
Tags: Google Cloud Platform, Big Data, Apache Beam, Dataflow, Technical Track, Cloud, Google Cloud Platform (Gcp), Scala
The “incremental repair” feature has been around since Cassandra's 2.1. Conceptually the idea behind incremental repair is straightforward, but it can get complicated. The official Datastax document describes the procedure for migrating to incremental repair, but in my opinion, it doesn't give a full picture. This post aims to fill in this gap by summarizing and consolidating the information of Cassandra incremental repair. Note: this post assumes the reader has a basic understanding of Apache Cassandra, especially the "repair" concept within Cassandra.
1. IntroductionThe idea of incremental repair is to mark SSTables that are already repaired with a flag (a timestamp called repairedAt indicating when it was repaired) and when the next run of repair operation begins, only previously unrepaired SSTables are scanned for repair. The goal of an “incremental repair” is two-fold: 1) It aims to reduce the big expense that is involved in a repair operation that sets out to calculate the “merkle tree” on all SSTables of a node; 2) It also makes repair network efficient because only rows that are marked as “inconsistent” will be sent across the network.
2. Impact on Compaction“Incremental repair” relies on an operation called anticompaction to fulfill its purpose. Basically, anticompaction means splitting an SSTable into two: one contains repaired data and the other contains non-repaired data. With the separation of the two sets of SSTables, the compaction strategy used by Cassandra also needs to be adjusted accordingly. This is because we cannot merge/compact a repaired SSTable with an unrepaired SSTable together. Otherwise, we lose the repaired states. Please note that when an SSTable is fully covered by a repaired range, no anticompaction will occur. It will just rewrite the repairedAt field in SSTable metadata. SizeTiered compaction strategy takes a simple strategy. Size-Tiered compaction is executed independently on the two sets of SSTables (repaired and unrepaired), as the result of incremental repair Anticompaction operation. Originally when designed for Leveled compaction strategy, leveled compaction is executed as usual on repaired set of SSTables, but for unrepaired set of SSTables, SizeTiered compaction will be executed. But this behavior has been changed to running Leveled compaction for both sets since version 2.1.2. Please see CASSANDRA-8004 for more detail. For DateTiered compaction strategy, “incremental repair” should NOT be used.
3. Migrating to Incremental RepairBy default, “nodetool repair” of Cassandra 2.1 does a full, sequential repair. We can use “nodetool repair” with “-inc” option to enable incremental repair. For Leveled compaction strategy, incremental repair actually changes the compaction strategy to SizeTiered compaction strategy for unrepaired SSTables. If a nodetool repair is executed for the first time on Leveled compaction strategy, it will do SizeTiered compaction on all SSTables because until the first incremental repair is done, Cassandra doesn’t know the repaired states. This is a very expensive operation and it is therefore recommended to migrate to incremental repair one node at a time, and follow the following procedure to migrate to incremental repair:
- Disable compaction on the node using nodetool disableautocompaction
- Run the default full, sequential repair.
- Stop the node.
- Use the tool sstablerepairedset to mark all the SSTables that were created before you disabled compaction.
- Restart cassandra
3.1 Tools for managing SSTable repaired/unrepaired stateCassandra offers two utilities for SSTable repaired/unrepaired state management:
- sstablemetadata is used to check repaired/unrepaired state of an SSTable. The syntax is as below:
- sstablerepairedset is used to manually mark if an SSTable is repaired or unrepaired. The syntax is as below. Note that this tool has to be used when Cassandra is stopped.
4. Other Considerations with Incremental RepairThere are some other things to consider when using incremental repair.
- For Leveled compaction, once an incremental repair is used, it should be done so continuously. Otherwise, only SizeTiered compaction will be executed. It is recommended to run incremental repair daily and run full repairs weekly to monthly.
- Recovering from missing data or corrupted SSTables require a non-incremental full repair.
- “nodetool repair” –local option should be only used with full repair, not with incremental repair.
- In C* 2.1, sequential repair and incremental repair does NOT work together.
- With SSTable’s repaired states being tracked via it’s metadata, some Cassandra tools can impact the repaired states:
- Bulk loading will make loaded SSTables unrepaired, even if was repaired in a different cluster.
- If scrubbing causes dropped rows, new SSTables will be marked as unrepaired. Otherwise, SSTables will keep their original repaired state.