Step-by-bstep monitoring Cassandra with with Prometheus and Grafana
In this blog, I’m going to give a detailed guide on how to monitor a Cassandra cluster with Prometheus and Grafana. For this, I’m using a new VM which I’m going to call “Monitor VM”. In this blog post, I’m going to work on how to install the tools. In a second one, I’m going to go through the details on how to do use and configure Grafana dashboards to get the most out of your monitoring!
High level plan
Monitor VM
- Install Prometheus
- Configure Prometheus
- Install Grafana
Cassandra VMs
- Download prometheus JMX-Exporter
- Configure JMX-Exporter
- Configure Cassandra
- Restart Cassandra
Detailed Plan
Monitor VM
Step 1. Install Prometheus| [code] $ wget https://github.com/prometheus/prometheus/releases/download/v2.3.1/prometheus-2.3.1.linux-amd64.tar.gz $ tar xvfz prometheus-*.tar.gz $ cd prometheus-*[/code] |
Step 2. Configure Prometheus
| [code] $ vim /etc/prometheus/prometheus.yaml[/code] |
| [code] global: scrape_interval: 15s scrape_configs: # Cassandra config - job_name: 'cassandra' scrape_interval: 15s static_configs: - targets: ['cassandra01:7070', 'cassandra02:7070', 'cassandra03:7070'][/code] |
Step 3. Create storage and start Prometheus
| [code] $ mkdir /data $ chown prometheus:prometheus /data $ prometheus --config.file=/etc/prometheus/prometheus.yaml[/code] |
Step 4. Install Grafana
| [code] $ wget https://s3-us-west-2.amazonaws.com/grafana-releases/release/grafana_5.1.4_amd64.deb $ sudo apt-get install -y adduser libfontconfig $ sudo dpkg -i grafana_5.1.4_amd64.deb[/code] |
Step 5. Start Grafana
| [code] $ sudo service grafana-server start[/code] |
Cassandra nodes
Step 1. Download JMX-Exporter:
| [code] $ mkdir /opt/jmx_prometheus $ wget https://repo1.maven.org/maven2/io/prometheus/jmx/jmx_prometheus_javaagent/0.3.0/jmx_prometheus_javaagent-0.3.0.jar[/code] |
Step 2. Configure JMX-Exporter
| [code] $ vim /opt/jmx_prometheus/cassandra.yml[/code] |
| [code] lowercaseOutputName: true lowercaseOutputLabelNames: true whitelistObjectNames: [ "org.apache.cassandra.metrics:type=ColumnFamily,name=RangeLatency,*", "org.apache.cassandra.metrics:type=ColumnFamily,name=LiveSSTableCount,*", "org.apache.cassandra.metrics:type=ColumnFamily,name=SSTablesPerReadHistogram,*", "org.apache.cassandra.metrics:type=ColumnFamily,name=SpeculativeRetries,*", "org.apache.cassandra.metrics:type=ColumnFamily,name=MemtableOnHeapSize,*", "org.apache.cassandra.metrics:type=ColumnFamily,name=MemtableSwitchCount,*", "org.apache.cassandra.metrics:type=ColumnFamily,name=MemtableLiveDataSize,*", "org.apache.cassandra.metrics:type=ColumnFamily,name=MemtableColumnsCount,*", "org.apache.cassandra.metrics:type=ColumnFamily,name=MemtableOffHeapSize,*", "org.apache.cassandra.metrics:type=ColumnFamily,name=BloomFilterFalsePositives,*", "org.apache.cassandra.metrics:type=ColumnFamily,name=BloomFilterFalseRatio,*", "org.apache.cassandra.metrics:type=ColumnFamily,name=BloomFilterDiskSpaceUsed,*", "org.apache.cassandra.metrics:type=ColumnFamily,name=BloomFilterOffHeapMemoryUsed,*", "org.apache.cassandra.metrics:type=ColumnFamily,name=SnapshotsSize,*", "org.apache.cassandra.metrics:type=ColumnFamily,name=TotalDiskSpaceUsed,*", "org.apache.cassandra.metrics:type=CQL,name=RegularStatementsExecuted,*", "org.apache.cassandra.metrics:type=CQL,name=PreparedStatementsExecuted,*", "org.apache.cassandra.metrics:type=Compaction,name=PendingTasks,*", "org.apache.cassandra.metrics:type=Compaction,name=CompletedTasks,*", "org.apache.cassandra.metrics:type=Compaction,name=BytesCompacted,*", "org.apache.cassandra.metrics:type=Compaction,name=TotalCompactionsCompleted,*", "org.apache.cassandra.metrics:type=ClientRequest,name=Latency,*", "org.apache.cassandra.metrics:type=ClientRequest,name=Unavailables,*", "org.apache.cassandra.metrics:type=ClientRequest,name=Timeouts,*", "org.apache.cassandra.metrics:type=Storage,name=Exceptions,*", "org.apache.cassandra.metrics:type=Storage,name=TotalHints,*", "org.apache.cassandra.metrics:type=Storage,name=TotalHintsInProgress,*", "org.apache.cassandra.metrics:type=Storage,name=Load,*", "org.apache.cassandra.metrics:type=Connection,name=TotalTimeouts,*", "org.apache.cassandra.metrics:type=ThreadPools,name=CompletedTasks,*", "org.apache.cassandra.metrics:type=ThreadPools,name=PendingTasks,*", "org.apache.cassandra.metrics:type=ThreadPools,name=ActiveTasks,*", "org.apache.cassandra.metrics:type=ThreadPools,name=TotalBlockedTasks,*", "org.apache.cassandra.metrics:type=ThreadPools,name=CurrentlyBlockedTasks,*", "org.apache.cassandra.metrics:type=DroppedMessage,name=Dropped,*", "org.apache.cassandra.metrics:type=Cache,scope=KeyCache,name=HitRate,*", "org.apache.cassandra.metrics:type=Cache,scope=KeyCache,name=Hits,*", "org.apache.cassandra.metrics:type=Cache,scope=KeyCache,name=Requests,*", "org.apache.cassandra.metrics:type=Cache,scope=KeyCache,name=Entries,*", "org.apache.cassandra.metrics:type=Cache,scope=KeyCache,name=Size,*", "org.apache.cassandra.metrics:type=Client,name=connectedNativeClients,*", "org.apache.cassandra.metrics:type=Client,name=connectedThriftClients,*", "org.apache.cassandra.metrics:type=Table,name=WriteLatency,*", "org.apache.cassandra.metrics:type=Table,name=ReadLatency,*", "org.apache.cassandra.net:type=FailureDetector,*", ] rules: - pattern: org.apache.cassandra.metrics<type=(Connection|Streaming), scope=(\S*), name=(\S*)><>(Count|Value) name: cassandra_$1_$3 labels: address: "$2" - pattern: org.apache.cassandra.metrics<type=(ColumnFamily), name=(RangeLatency)><>(Mean) name: cassandra_$1_$2_$3 - pattern: org.apache.cassandra.net<type=(FailureDetector)><>(DownEndpointCount) name: cassandra_$1_$2 - pattern: org.apache.cassandra.metrics<type=(Keyspace), keyspace=(\S*), name=(\S*)><>(Count|Mean|95thPercentile) name: cassandra_$1_$3_$4 labels: "$1": "$2" - pattern: org.apache.cassandra.metrics<type=(Table), keyspace=(\S*), scope=(\S*), name=(\S*)><>(Count|Mean|95thPercentile) name: cassandra_$1_$4_$5 labels: "keyspace": "$2" "table": "$3" - pattern: org.apache.cassandra.metrics<type=(ClientRequest), scope=(\S*), name=(\S*)><>(Count|Mean|95thPercentile) name: cassandra_$1_$3_$4 labels: "type": "$2" - pattern: org.apache.cassandra.metrics<type=(\S*)(?:, ((?!scope)\S*)=(\S*))?(?:, scope=(\S*))?, name=(\S*)><>(Count|Value) name: cassandra_$1_$5 labels: "$1": "$4" "$2": "$3"[/code] |
Step 3. Configure Cassandra
| [code] echo 'JVM_OPTS="$JVM_OPTS -javaagent:/opt/prometheus-exporter/jmx_prometheus_javaagent-0.3.0.jar=7070:/opt/prometheus-exporter/cassandra.yaml"' >> conf/cassandra-env.sh[/code] |
Step 4. Restart Cassandra
| [code] $ nodetool flush $ nodetool drain $ sudo service cassandra restart[/code] |
And now, if you have no errors (and you shouldn’t!) your Prometheus is ingesting your Cassandra metrics!
Wait for the next blog post where I will guide you through a good Grafana configuration!
Cassandra Consulting Services
Ready to optimize your Oracle Database for the future?
Share this
Share this
More resources
Learn more about Pythian by reading the following blogs and articles.
Let’s deal with high read latencies in Cassandra
Backup strategies in Cassandra

Listener over Infiniband on Exadata (part 1)
Ready to unlock value from your data?
With Pythian, you can accomplish your data transformation goals and more.