Cassandra Monitoring Technical Reference

General information

Overview

Cassandra monitoring is a Gateway configuration file that enables monitoring of Cassandra through a set of samplers with customised JMX plug-in settings.

Apache Cassandra is a free and open-source distributed NoSQL database management system that provides scalability and high-availability.

Some of Cassandra's key attributes are:

  • Fault tolerant - Data is automatically replicated to multiple nodes for fault-tolerance.
  • Decentralized - There are no single points of failure.
  • Elastic - Read and write throughput increase linearly as new machines are added, with no downtime or interruption to applications.

It is important to monitor Cassandra performance to identify database slowdowns, interruptions, or pressing resource limitations - and take quick and appropriate actions to correct them.

This technical reference provides information on the metrics and dataviews for the samplers available through the Cassandra integration. If you are setting up the Cassandra integration for the first time, see Cassandra Monitoring User Guide.

Metrics and dataviews

Cassandra disk usage

This dataview displays the disk usage-related metrics. Monitoring these node-level metrics are critical to determine if additional nodes are needed:

Row Description
Compaction CompletedTasks Number of completed compactions since the server (re)start.
Compation PendingTasks Estimated number of compactions remaining to perform.
Storage Load The size, in bytes, of the on disk data size this node manages.
   

MBeans for Cassandra-DiskUsage

  • org.apache.cassandra.metrics:type=Compaction,name=CompletedTasks
  • org.apache.cassandra.metrics:type=Compaction,name=PendingTasks
  • org.apache.cassandra.metrics:type=Storage,name=Load

Cassandra errors

This dataview displays the count of specific errors and exceptions encountered by a Cassandra node. These metrics are helpful in identifying problematic nodes:

Column Description
StorageExceptions Number of internal exceptions caught. Under normal exceptions, this should be zero.
ReadTimeouts Number of read timeouts encountered.
WriteTimeouts Number of write timeouts encountered.
ReadUnavailables Number of read unavailable exceptions encountered.
WriteUnavailables Number of write unavailable exceptions encountered.
   

MBeans for Cassandra-Errors

  • org.apache.cassandra.metrics:type=Storage,name=Exceptions
  • org.apache.cassandra.metrics:type=ClientRequest,scope=Read,name=Timeouts
  • org.apache.cassandra.metrics:type=ClientRequest,scope=Write,name=Timeouts
  • org.apache.cassandra.metrics:type=ClientRequest,scope=Read,name=Unavailables
  • org.apache.cassandra.metrics:type=ClientRequest,scope=Write,name=Unavailables

Cassandra GC

This dataview displays the selected JVM Garbage Collector metrics. Cassandra is a Java-based system so it relies on Java garbage collection (GC) processes to free up memory. Any significant increase in GC latency will impact Cassandra’s performance:

Column Description
ConcurrentMarkSweep CollectionCount Total number of CMS collections that have occurred.
ConcurrentMarkSweep CollectionTime Approximate accumulated CMS collection elapsed time in milliseconds.
ConcurrentMarkSweep LastGCDuration Elapsed time of the last CMS GC in milliseconds.
ParNew CollectionCount Total number of ParNew collections that have occurred.
ParNew CollectionTime Approximate accumulated ParNew collection elapsed time in milliseconds.
ParNew LastGCDuration Elapsed time of the last ParNew GC in milliseconds.
   

MBeans for Cassandra-GC

  • java.lang:type=GarbageCollector,name=ConcurrentMarkSweep,*
  • java.lang:type=GarbageCollector,name=ParNew,*

Cassandra latency

This dataview displays the node-level latency metrics. It gives a view on Cassandra's performance and can identify potential, network issues, or bottlenecks:

Column Description
Operation Type of operation (Read or Write).
Events Number of operation events.
TotalLatency Accumulated latency in microseconds.
AverageLatency Average latency in microseconds (TotalLatency divided by Events).
   

MBeans for Cassandra-Latency

  • org.apache.cassandra.metrics:type=ClientRequest,scope=Read,name=Latency
  • org.apache.cassandra.metrics:type=ClientRequest,scope=Read,name=TotalLatency
  • org.apache.cassandra.metrics:type=ClientRequest,scope=Write,name=Latency
  • org.apache.cassandra.metrics:type=ClientRequest,scope=Write,name=TotalLatency

 

Cassandra tasks

This dataview displays the count of pending and blocked tasks in various stages. It identifies bottlenecks and potential problems:

Column Description
Status Task status (Blocked or Pending).
CounterMutationStage Number of tasks in the Counter Mutation stage.
MutationStage Number of tasks in the Mutation stage.
ReadRepairStage Number of tasks in the Read Repair stage.
ReadStage Number of tasks in the Read stage.
RequestResponseStage Number of tasks in the Request Response stage.
   

MBeans for Cassandra-Tasks

  • org.apache.cassandra.metrics:type=ThreadPools,path=request,scope=CounterMutationStage,name=<CurrentlyBlockedTasks | PendingTasks>
  • org.apache.cassandra.metrics:type=ThreadPools,path=request,scope=MutationStage,name=<CurrentlyBlockedTasks | PendingTasks> PendingTasks>org.apache.cassandra.metrics:type=ThreadPools,path=request,scope=ReadRepairStage,name==<CurrentlyBlockedTasks | PendingTasks>
  • org.apache.cassandra.metrics:type=ThreadPools,path=request,scope=ReadStage,name==<CurrentlyBlockedTasks | PendingTasks>
  • org.apache.cassandra.metrics:type=ThreadPools,path=request,scope=RequestResponseStage,name==<CurrentlyBlockedTasks | PendingTasks>

Cassandra throughput

This dataview displays the node-level throughput metrics. It gives a high-level view on the node’s activity levels and is important in understanding how and how much the node is being used:

Column Description
TimePeriod Time period (1 or 5 minutes).
ReadThroughput Read events per second during the last time period.
WriteThroughput Write events per second during the last time period.
   

MBeans for Cassandra-Throughput

  • org.apache.cassandra.metrics:type=ClientRequest,scope=Read,name=Latency
  • org.apache.cassandra.metrics:type=ClientRequest,scope=Write,name=Latency