Examining the Lifecycle of Tombstones in Apache Cassandra
This post is the first part of a series of blog posts regarding the lifecycle and management of tombstones. Deleting and expiring data in Cassandra is something that you should carefully plan. Especially if you’re about to delete a massive amount of data at once. Without proper planning, this can bring problems to the cluster like an increase in read-latency and disk usage footprint. Throughout this post, I will describe a way of tackling this and its caveats. First, let's go over the basics. In Cassandra, the data files (SSTables) existing on disk are immutable files. When we are deleting something in Cassandra, a new SSTable is created that contains a marker. This marker indicates which partition, row, or cell was removed along with the timestamp for that deletion. This deletion marker is called a tombstone. The deleted data and the tombstone can coexist on the disk during a period called gc_grace_seconds, which is by default 10 days. During this time, although the data can still exist on the drive, it's not returned to the client if queried. Which means that from a client point of perspective, the data is deleted as soon as you execute the delete statement. However, from an operational point of view, both the data and the tombstone can still exist and occupy space on the disk. This deleted data which still exists on disk is called “shadowed data.” N.B.: In the previous paragraph I say that the data and the tombstones "can still coexist on disc" (emphasis on the "can") because if compaction occurs that involves the data and the tombstones, then the data is evicted. But the tombstone will remain if gc_grace_seconds has not passed. The gc_grace_seconds is a safety mechanism to ensure that the tombstone has enough time to replicate to all nodes that have a replica of the shadowed data. For this safety mechanism to be successful, you must be able to repair the cluster every gc_grace_seconds. Meaning that if you are using the default 10 days for gc_grace_seconds, a repair must be started and finished within every 10 days. The repair serves the purpose of preventing zombie data in the cluster. I won't go into the details of how this can happen, but basically, zombie data occurs when you delete data, but it returns sometime later. After gc_grace_seconds has passed, the data and the tombstone can finally be evicted from the disk, recovering the disk space previously used by them (Figure 1). The ETA to release the disk space is one of the first points that I want to clarify in this post. When using Cassandra, many people find this mechanism surprising, i.e., that after deleting data, the data still exists on disk. This confusion is usually quickly clarified as soon as people learn about tombstones and gc_grace_seconds. However, it seems that operators tend to think that the data and the tombstones are evicted right after gc_grace_seconds has finished. In fact, it's essential to be aware that this might not (and tends not) to be true. Usually, it takes longer than gc_grace_seconds for this eviction to happen and the disk space recovered. [caption id="attachment_106618" align="aligncenter" width="852"]
![](https://www.pythian.com/hs-fs/hubfs/Imported_Blog_Media/Shadowed-data-and-tombstone-eviction.jpg?width=852&height=189&name=Shadowed-data-and-tombstone-eviction.jpg)
![](https://www.pythian.com/hs-fs/hubfs/Imported_Blog_Media/Shadowed-data-and-tombstone-separate-compactions-1.jpg?width=851&height=428&name=Shadowed-data-and-tombstone-separate-compactions-1.jpg)
Share this
You May Also Like
These Related Stories
Lightweight transactions in Cassandra
Lightweight transactions in Cassandra
Mar 28, 2018
2
min read
An effective approach to migrate dynamic thrift data to CQL, part 1
An effective approach to migrate dynamic thrift data to CQL, part 1
May 17, 2016
6
min read
Cassandra and Vault - securing C* secrets
![](https://www.pythian.com/hubfs/Imported_Blog_Media/2000px-Cassandra_logo_svg_-e1441066878645-4.png)
Cassandra and Vault - securing C* secrets
Mar 14, 2018
4
min read
No Comments Yet
Let us know what you think