In this blog, I’m going to give a detailed guide on how to monitor a Cassandra cluster with Prometheus and Grafana. For this, I’m using a new VM which I’m going to call “Monitor VM”. In this blog post, I’m going to work on how to install the tools. In a second one, I’m going to go through the details on how to do use and configure Grafana dashboards to get the most out of your monitoring!
| [code] $ wget https://github.com/prometheus/prometheus/releases/download/v2.3.1/prometheus-2.3.1.linux-amd64.tar.gz $ tar xvfz prometheus-*.tar.gz $ cd prometheus-*[/code] |
Step 2. Configure Prometheus
| [code] $ vim /etc/prometheus/prometheus.yaml[/code] |
| [code] global: scrape_interval: 15s scrape_configs: # Cassandra config - job_name: 'cassandra' scrape_interval: 15s static_configs: - targets: ['cassandra01:7070', 'cassandra02:7070', 'cassandra03:7070'][/code] |
Step 3. Create storage and start Prometheus
| [code] $ mkdir /data $ chown prometheus:prometheus /data $ prometheus --config.file=/etc/prometheus/prometheus.yaml[/code] |
Step 4. Install Grafana
| [code] $ wget https://s3-us-west-2.amazonaws.com/grafana-releases/release/grafana_5.1.4_amd64.deb $ sudo apt-get install -y adduser libfontconfig $ sudo dpkg -i grafana_5.1.4_amd64.deb[/code] |
Step 5. Start Grafana
| [code] $ sudo service grafana-server start[/code] |
Step 1. Download JMX-Exporter:
| [code] $ mkdir /opt/jmx_prometheus $ wget https://repo1.maven.org/maven2/io/prometheus/jmx/jmx_prometheus_javaagent/0.3.0/jmx_prometheus_javaagent-0.3.0.jar[/code] |
Step 2. Configure JMX-Exporter
| [code] $ vim /opt/jmx_prometheus/cassandra.yml[/code] |
| [code] lowercaseOutputName: true lowercaseOutputLabelNames: true whitelistObjectNames: [ "org.apache.cassandra.metrics:type=ColumnFamily,name=RangeLatency,*", "org.apache.cassandra.metrics:type=ColumnFamily,name=LiveSSTableCount,*", "org.apache.cassandra.metrics:type=ColumnFamily,name=SSTablesPerReadHistogram,*", "org.apache.cassandra.metrics:type=ColumnFamily,name=SpeculativeRetries,*", "org.apache.cassandra.metrics:type=ColumnFamily,name=MemtableOnHeapSize,*", "org.apache.cassandra.metrics:type=ColumnFamily,name=MemtableSwitchCount,*", "org.apache.cassandra.metrics:type=ColumnFamily,name=MemtableLiveDataSize,*", "org.apache.cassandra.metrics:type=ColumnFamily,name=MemtableColumnsCount,*", "org.apache.cassandra.metrics:type=ColumnFamily,name=MemtableOffHeapSize,*", "org.apache.cassandra.metrics:type=ColumnFamily,name=BloomFilterFalsePositives,*", "org.apache.cassandra.metrics:type=ColumnFamily,name=BloomFilterFalseRatio,*", "org.apache.cassandra.metrics:type=ColumnFamily,name=BloomFilterDiskSpaceUsed,*", "org.apache.cassandra.metrics:type=ColumnFamily,name=BloomFilterOffHeapMemoryUsed,*", "org.apache.cassandra.metrics:type=ColumnFamily,name=SnapshotsSize,*", "org.apache.cassandra.metrics:type=ColumnFamily,name=TotalDiskSpaceUsed,*", "org.apache.cassandra.metrics:type=CQL,name=RegularStatementsExecuted,*", "org.apache.cassandra.metrics:type=CQL,name=PreparedStatementsExecuted,*", "org.apache.cassandra.metrics:type=Compaction,name=PendingTasks,*", "org.apache.cassandra.metrics:type=Compaction,name=CompletedTasks,*", "org.apache.cassandra.metrics:type=Compaction,name=BytesCompacted,*", "org.apache.cassandra.metrics:type=Compaction,name=TotalCompactionsCompleted,*", "org.apache.cassandra.metrics:type=ClientRequest,name=Latency,*", "org.apache.cassandra.metrics:type=ClientRequest,name=Unavailables,*", "org.apache.cassandra.metrics:type=ClientRequest,name=Timeouts,*", "org.apache.cassandra.metrics:type=Storage,name=Exceptions,*", "org.apache.cassandra.metrics:type=Storage,name=TotalHints,*", "org.apache.cassandra.metrics:type=Storage,name=TotalHintsInProgress,*", "org.apache.cassandra.metrics:type=Storage,name=Load,*", "org.apache.cassandra.metrics:type=Connection,name=TotalTimeouts,*", "org.apache.cassandra.metrics:type=ThreadPools,name=CompletedTasks,*", "org.apache.cassandra.metrics:type=ThreadPools,name=PendingTasks,*", "org.apache.cassandra.metrics:type=ThreadPools,name=ActiveTasks,*", "org.apache.cassandra.metrics:type=ThreadPools,name=TotalBlockedTasks,*", "org.apache.cassandra.metrics:type=ThreadPools,name=CurrentlyBlockedTasks,*", "org.apache.cassandra.metrics:type=DroppedMessage,name=Dropped,*", "org.apache.cassandra.metrics:type=Cache,scope=KeyCache,name=HitRate,*", "org.apache.cassandra.metrics:type=Cache,scope=KeyCache,name=Hits,*", "org.apache.cassandra.metrics:type=Cache,scope=KeyCache,name=Requests,*", "org.apache.cassandra.metrics:type=Cache,scope=KeyCache,name=Entries,*", "org.apache.cassandra.metrics:type=Cache,scope=KeyCache,name=Size,*", "org.apache.cassandra.metrics:type=Client,name=connectedNativeClients,*", "org.apache.cassandra.metrics:type=Client,name=connectedThriftClients,*", "org.apache.cassandra.metrics:type=Table,name=WriteLatency,*", "org.apache.cassandra.metrics:type=Table,name=ReadLatency,*", "org.apache.cassandra.net:type=FailureDetector,*", ] rules: - pattern: org.apache.cassandra.metrics<type=(Connection|Streaming), scope=(\S*), name=(\S*)><>(Count|Value) name: cassandra_$1_$3 labels: address: "$2" - pattern: org.apache.cassandra.metrics<type=(ColumnFamily), name=(RangeLatency)><>(Mean) name: cassandra_$1_$2_$3 - pattern: org.apache.cassandra.net<type=(FailureDetector)><>(DownEndpointCount) name: cassandra_$1_$2 - pattern: org.apache.cassandra.metrics<type=(Keyspace), keyspace=(\S*), name=(\S*)><>(Count|Mean|95thPercentile) name: cassandra_$1_$3_$4 labels: "$1": "$2" - pattern: org.apache.cassandra.metrics<type=(Table), keyspace=(\S*), scope=(\S*), name=(\S*)><>(Count|Mean|95thPercentile) name: cassandra_$1_$4_$5 labels: "keyspace": "$2" "table": "$3" - pattern: org.apache.cassandra.metrics<type=(ClientRequest), scope=(\S*), name=(\S*)><>(Count|Mean|95thPercentile) name: cassandra_$1_$3_$4 labels: "type": "$2" - pattern: org.apache.cassandra.metrics<type=(\S*)(?:, ((?!scope)\S*)=(\S*))?(?:, scope=(\S*))?, name=(\S*)><>(Count|Value) name: cassandra_$1_$5 labels: "$1": "$4" "$2": "$3"[/code] |
Step 3. Configure Cassandra
| [code] echo 'JVM_OPTS="$JVM_OPTS -javaagent:/opt/prometheus-exporter/jmx_prometheus_javaagent-0.3.0.jar=7070:/opt/prometheus-exporter/cassandra.yaml"' >> conf/cassandra-env.sh[/code] |
Step 4. Restart Cassandra
| [code] $ nodetool flush $ nodetool drain $ sudo service cassandra restart[/code] |
And now, if you have no errors (and you shouldn’t!) your Prometheus is ingesting your Cassandra metrics!
Wait for the next blog post where I will guide you through a good Grafana configuration!
Ready to optimize your Oracle Database for the future?