Cassandra backups using nodetool

4 min read

Jun 11, 2018 12:00:00 AM

Cassandra nodetool provides several types of commands to manage your Cassandra cluster. See my previous posts for an orientation to Cassandra nodetool and using nodetool to get Cassandra information. My colleague has provided an in-depth analysis of backup strategies in Cassandra that you can review to learn more about ways to minimize storage cost and time-to-recovery, and to maximize performance. Below I will cover the nodetool commands used in scripting these best practices for managing Cassandra full and incremental backups.

Snapshots

The basic way to backup Cassandra is to take a snapshot. Since sstables are immutable, and since the snapshot command flushes data from memory before taking the snapshot, this will provide a complete backup. Use nodetool snapshot to take a snapshot of sstables. You can specify a particular keyspace as an optional argument to the command, like nodetool snapshot keyspace1. This will produce a snapshot for each table in the keyspace, as shown in this sample output from nodetool listsnapshots:

Snapshot Details:
 Snapshot name Keyspace name Column family name True size Size on disk
 1528233451291 keyspace1 standard1 1.81 MiB 1.81 MiB
 1528233451291 keyspace1 counter1 0 bytes 864 bytes

The first column is the snapshot name, to refer to the snapshot in other nodetool backup commands. You can also specify tables in the snapshot command. The output at the end of the list of snapshots -- for example, Total TrueDiskSpaceUsed: 5.42 MiB -- shows, as the name suggests, the actual size of the snapshot files, as calculated using the walkFileTree Java method. Verify this by adding up the files within each snapshots directory under your data directory keyspace/tablename (e.g., du -sh /var/lib/cassandra/data/keyspace1/standard1*/snapshots). To make the snapshots more human readable, you can tag them. Running nodetool snapshot -t 2018June05b_DC1C1_keyspace1 keyspace1 results in a more obvious snapshot name as shown in this output from nodetool listsnapshots:

2018June05b_DC1C1_keyspace1 keyspace1 standard1 1.81 MiB 1.81 MiB
 2018June05b_DC1C1_keyspace1 keyspace1 counter1 0 bytes 864 bytes

However, if you try to use a snapshot name that exists, you'll get an ugly error:

error: Snapshot 2018June05b_DC1C1_keyspace1 already exists.
 -- StackTrace --
 java.io.IOException: Snapshot 2018June05b_DC1C1_keyspace1 already exists....

The default snapshot name is already a timestamp (number of milliseconds since the Unix epoch), but it's a little hard to read. You could get the best of both worlds by doing something like (depending on your operating system): nodetool snapshot -t keyspace1_date +"%s" keyspace1. I like how the results of listsnapshots sorts that way, too. In any case, with inevitable snapshot automation, the human-readable factor becomes largely irrelevant. You may also see snapshots in this listing that you didn't take explicitly. By default, auto_snapshot is turned on in the cassandra.yaml configuration file, causing a snapshot to be taken anytime a table is truncated or dropped. This is an important safety feature, and it's recommended that you leave it enabled. Here's an example of a snapshot created when a table is truncated:

cqlsh> truncate keyspace1.standard1;
 
 root@DC1C1:/# nodetool listsnapshots
 Snapshot Details:
 Snapshot name Keyspace name Column family name True size Size on disk
 truncated-1528291995840-standard1 keyspace1 standard1 3.57 MiB 3.57 MiB

To preserve disk space (or cost), you will want to eventually delete snapshots. Use nodetool clearsnapshot with the -t flag and the snapshot name (recommended, to avoid deleting all snapshots). Specifying -- and the keyspace name will additionally filter the deletion to the keyspace specified. For example, nodetool clearsnapshot -t 1528233451291 -- keyspace1 will remove just the two snapshot files listed above, as reported in this sample output:

Requested clearing snapshot(s) for [keyspace1] with snapshot name [1528233451291]

Note that if you forget the -t flag or the -- you will get undesired results. Without the -t flag, the command will not read the snapshot name, and without the -- delimiter, you will end up deleting all snapshots for the keyspace. Check syntax carefully. The sstables are not tied to any particular instance of Cassandra or server, so you can pass them around as needed. (For example, you may need to populate a test server.) If you put an sstable in your data directory and run nodetool refresh, it will load into Cassandra. Here's a simple demonstration:

cqlsh> truncate keyspace1.standard1
 
 cp /var/lib/cassandra/data/keyspace1/standard1-60a1a450690111e8823fa55ed562cd82/snapshots/keyspace1_1528236376/* /var/lib/cassandra/data/keyspace1/standard1-60a1a450690111e8823fa55ed562cd82/
 
 cqlsh> select * from keyspace1.standard1 limit 1;
 
 key | C0 | C1 | C2 | C3 | C4
 -----+----+----+----+----+----
 (0 rows)
 
 nodetool refresh keyspace1 standard1
 
 cqlsh> select count(*) from keyspace1.standard1;
 count
 7425

This simple command has obvious implications for your backup and restore automation.

Incrementals

Incremental backups are taken automatically -- generally more frequently than snapshots are scheduled -- whenever sstables are flushed from memory to disk. This provides a more granular point-in-time recovery, as needed. There's not as much operational fun to be had with incremental backups. Use nodetool statusbackup to show if they are Running or not. By default, unless you've changed the cassandra.yaml configuration file, they will be not running. Turn them on with nodetool enablebackup and turn them off with nodetool disablebackup. A nodetool listbackups command doesn't exist, but you can view the incremental backups in the data directory under keyspace/table/backups.

The backups/snapshots nomenclature is truly confusing, but you could think of snapshots as something you do, and backups as something that happen. Restoring from incrementals is similar to restoring from a snapshot -- copying the files and running nodetool refresh -- but incrementals require a snapshot. These various nodetool commands can be used in combination in scripts to automate your backup and recovery processes. Don't forget to monitor disk space and clean up the files created by your backup processes. Remember that if you'd like to try out these commands locally, you can use the ccm tool or the Cassandra Docker cluster here.