Elasticsearch Monitoring Technical Reference

General information

Overview

Elasticsearch monitoring is a Gateway configuration file that enables monitoring of Elasticsearch Cluster through the Toolkit plug-in.

Elasticsearch is a distributed, search and analytics engine that is capable of scaling horizontally, allowing to add more nodes to the cluster. This means that it can search and analyze large scale of data.

The elements that make Elasticsearch work are defined as follows:

  • Node is a running instance of Elasticsearch that is capable of knowing the location of the document.
  • Cluster consists of one or more nodes with the same cluster name that can share their data and load.

Track the following key areas when using Elasticsearch monitoring:

Key Area Description
Search performance Determine how the search function perform over time by monitoring the query operations, load or latency, field data cache and evictions.
Indexing performance Each shard in the index can be updated through flush and refresh process.

Shard is a container for data that can be either a primary or a replica shard. It is how the Elasticsearch distributes data in the clusters.

  • Index refresh - creates a new in-memory segment allowing the newly indexed documents searchable.
  • Index flush - new documents are added to the in-memory buffer, the segments are committed, and the transaction log is cleared.
Cluster health and node availability Monitors the current state of all clusters and nodes.
Resource utilisation Provides information on how the thread pool queues and rejection works in monitoring the bulk, index, merge, and operations.
System and network metrics Shows information about every node in the cluster, resource and memory usage, and active connections opened over time.
   

In this Elasticsearch monitoring template, you will see these metrics in your dataview:

  • Cluster health
  • Indexing performance
  • Search performance
  • Node and resource information
  • Thread pool

This technical reference provides information on the metrics and dataviews for the samplers available through the Elasticsearch integration. If you are setting up the Elasticsearch integration for the first time, see Elasticsearch Monitoring User Guide.

Metrics and dataviews

Elasticsearch cluster health

This monitors the overall health of the cluster by indicating how it is functioning:

Column Name Description
cluster Name of the cluster.
status

Health status of the cluster:

  • Green - all primary and replica shards are active.
  • Yellow - indicates that at least one replica shard is not properly allocated or missing.
  • Red - indicates that at least one primary shard is missing that can cause data loss.
nodeTotal Total number of nodes in the cluster.
nodeData Total number of nodes in the cluster that can store data.
shardsTotal Total number of shards.
shardsInitializing Number of initialising nodes.
shardsUnassigned Number of unassigned shards.
   

 

Elasticsearch indexingPerf-ByIndex

This dataview monitors indexing performance by index. Data is grouped per index:

Column Name Description
index Name of the index.
indexingIndexTotal Total number of indexing operations.
indexingIndexTime Time spent in indexing.
 
Unit: millisecond (ms)
indexingIndexCurrent Number of current indexing operations.
refreshTotal Total number of refreshes.
refreshTime Time spent in refresh operations.
 
Unit: millisecond (ms)
flushTotal Total number of flushes.
flushTotalTime Time spent in flushes.
 
Unit: millisecond (ms)
averageIndexingLatency Average time spent in indexing. This is computed from indexingIndexTime / indexingIndexTotal.
 
Unit: millisecond (ms) per indexing operation
averageRefreshLatency Average time spent in refresh operations. This is computed from refreshTime / refreshTotal.
 
Unit: millisecond (ms) per refresh
averageFlushLatency Average time spent in flush operations. This is computed from flushTotalTime / flushTotal.
 
Unit: millisecond (ms) per flush
   

 

Elasticsearch indexingPerfp-ByNode

This monitors indexing performance by node. Data is grouped per node:

Column Name Description
nodeID Unique node ID.
name Name of the node.
indexingIndexTotal Total number of indexing operations.
indexingIndexTime Time spent in indexing.
Default: millisecond (ms)
indexingIndexCurrent Number of current indexing operations.
refreshTotal Total number of refreshes.
refreshTime Time spent in refresh operations.
 
Unit: millisecond (ms)
flushTotal Total number of flushes.
flushTotalTime Time spent in flushes.
 
Unit: millisecond (ms)
averageIndexingLatency Average time spent in indexing. This is computed from indexingIndexTime / indexingIndexTotal.
 
Unit: millisecond (ms) per indexing operation
averageRefreshLatency Average time spent in refresh operations. This is computed from refreshTime / refreshTotal.
 
Unit: millisecond (ms) per refresh
averageFlushLatency Average time spent in flush operations. This is computed from flushTotalTime / flushTotal.
 
Unit: millisecond (ms) per flush
   

 

Elasticsearch nodeInfo

This displays information about the nodes in the cluster:

Column Name Description
nodeID Unique node ID.
name Name of the node.
IP IP address.
port Bound transport port.
http Bound http address and port.
version Elasticsearch version.
build Elasticsearch build hash.
jdk JDK version.
nodeRole

Role of the node. This can have more than one value:

  • m - master eligible node.
  • d - data note.
  • i - ingest node.
master

Current master node in the cluster:

  • * (asterisk) - current master.
  • - (hyphen) - non-master.
   

 

Elasticsearch resource

This monitors the resources of each node in the cluster:

Column Name Description
nodeID Unique node ID.
name Name of the node.
cpu CPU usage in percentage (%).
heapCurrent Current heap usage.
 
Unit: bytes
heapPercent Percent used heap.
ramCurrent Current RAM usage.
 
Unit: bytes
ramPercent Percent RAM used.
diskUsed Used disk space.
 
Unit: bytes
diskAvail Available disk space.
diskUsedPercent Percent disk used.
fileDescriptorCurrent Number of used file descriptors.
fileDescriptorPercent Percent file descriptors used.
   

 

Elasticsearch SearchPerf-ByIndex

This monitors search performance by index. Data is grouped per index:

Column Name Description
index Name of the index.
searchQueryTotal Number of query phase operations.
searchQueryTime Time spent in query phase.
 
Default: millisecond (ms)
searchQueryCurrent Number of current query phase operations.
searchFetchTotal Number of fetch phase operations.
searchFetchTime Time spent in fetch phase.
 
Default: millisecond (ms)
searchFetchCurrent Number of current fetch phase operations.
fielddataMemory Used fielddata cache.
fielddataEvictions Used fielddata evictions.
averageQueryLatency Average time spent in query phase that is computed from searchQueryTime/searchQueryTotal.
 
Default: millisecond (ms) per query
averageFetchLatency Average time spent in fetch phase that is computed from searchFetchTime/searchFetchTotal.
 
Default: millisecond (ms) per fetch
   

 

Elasticsearch searchPerf-ByNode

This monitors search performance by node. Data is grouped per node:

Column Name Description
nodeID Unique node ID.
name Name assigned to the node.
searchQueryTotal Number of query phase operations.
searchQueryTime Time spent in query phase.
 
Unit: millisecond (ms)
searchQueryCurrent Number of current query phase operations.
searchFetchTotal Number of fetch phase operations.
searchFetchTime Time spent in fetch phase.
 
Unit: millisecond (ms)
searchFetchCurrent Number of current fetch phase operations.
fielddataMemory Used fielddata cache.
fielddataEvictions Used fielddata evictions.
averageQueryLatency Average time spent in query phase that is computed from searchQueryTime/searchQueryTotal.
 
Unit: millisecond (ms) per query
averageFetchLatency Average time spent in fetch phase that is computed from searchFetchTime/searchFetchTotal.
 
Unit: millisecond (ms) per fetch
   

 

Elasticsearch ThreadPool

This monitors the bulk, index, and search thread pools of each node in the cluster:

Column Name Description
node_id/name Node ID/Thread Pool Name.
node_name Name of the node.
name Thread Pool name.
type Thread Pool Type.
active Number of active threads.
queue Number of tasks currently in queue.
rejected Number of rejected tasks.
size Number of threads.
queue_size Size of the queue with pending requests that have no threads to execute.