Hadoop Monitoring User Guide

General information

Overview

Hadoop monitoring is a Gateway configuration file that enables monitoring of the Hadoop cluster, nodes, and daemons through the JMX and Toolkit plug-ins.

This Hadoop integration template consists of the following components:

  • Hadoop Distributed File System (HDFS) 
  • Yet Another Resource Negotiator (YARN)

The Hadoop Distributed File System or HDFS provides scalable data storage that can be deployed on hardware and optimised operations for large datasets.

The other component Yet Another Resource Negotiator or YARN assigns the computation resources for executing the application:

  • YARN ResourceManager - takes inventory of available and allocate resources to running applications.
  • YARN NodeManagers - monitors resource usage and communicates with the ResourceManager.

This guide discusses the steps to set up the Hadoop integration on a Gateway. Once the integration is set up, the samplers providing the dataviews become available to that Gateway.

To view the sample metrics and dataviews, see Hadoop Monitoring Technical Reference.

User requirements

This monitoring template is a Gateway configuration setup that can be included in the Gateway Setup Editor.

To use this template, your configuration must meet the following requirements:

  • Set up the HDFS with at least one DataNode.
  • A machine running the Netprobe must have access to the Hadoop published HTTP address and port.
  • An Active Console 2 that is connected to the Gateway.
  • Basic connectivity setup for JMX or Toolkit plug-in.
  • Hadoop configuration files extracted to their Gateway set-up.

System requirements

The following requirements must be met prior to the installation and setup of the template:

  • Template package: geneos-integration-hadoop-<version>.zip.
  • Managed entities utilising the samplers defined in the include/HadoopMonitoring.xml.
  • Netprobe version 4.5.
  • Apache Hadoop 3.0.0 or higher.
  • Python 2.7 or higher.

Audience

This document is a reference guide to introduce templates and scripts built using the Geneos development toolkit plug-ins.

The template allows you to integrate specific applications and services to collect metrics without having to create a new plug-in.

  • Administrator - oversees the administration and installation of the server.
  • User - uses the Hadoop metrics via JMX plug-in to monitor the data and reports.
  • Application Team - provides the connection settings for the JMX plug-in and queries for the Toolkit plug-in.

 

Install and set up

Ensure that you have read and can follow the system requirements prior to installation and setup of this integration template.

  1. Download the config file (geneos-integration-hadoop-<version>.zip) from the ITRS Downloads site.
  2. Open Active Console 2.
  3. Extract the file into the Gateway Setup directory.
  4. In the Navigation panel, click Includes to create a new file.
  5. Enter the location of the file to include in the Location field. In this example, use the include/HadoopMonitoring.xml:
  6. Expand the file location in the Includes section.
  7. Select Click to load...
  8. Click Yes to load the new Hadoop include file.
  9. Click Managed entities in the Navigation panel.
  10. Add the Hadoop-Cluster and Hadoop-Node types to the Managed Entity section that you will use to monitor Hadoop.
  11. Click the Validate button to check your configuration.

The Validate button allows you to check if there are any errors or warnings in your configuration set-up.

Once the Gateway configuration appears in the Includes section, you can add the samplers and other variables.

Set up the samplers

These are the pre-configured samplers available to use in HadoopMonitoring.xml.

Configure the required fields by referring to the table below:

Samplers
Hadoop-HDFS-NamenodeInfo
Hadoop-HDFS-NamenodeCluster
Hadoop-HDFS-SecondaryNamenodeInfo
Hadoop-HDFS-DatanodesSummary
Hadoop-HDFS-DatanodeVolumeInfo
Hadoop-YARN-ResourceManager
Hadoop-YARN-NodeManagersSummary
 

 

Set up the variables

The HadoopMonitoring.xml template provides the variables that are set in the Environments section.

Samplers Description
HADOOP_HOST_NAMENODE IP/Hostname where Namenode daemon is running.
HADOOP_HOST_SECONDARYNAMENODE IP/Hostname where Secondarynamenode daemon is running.
HADOOP_HOST_DATANODE IP/Hostname where the specific Datanode daemon is running.
HADOOP_HOST_RESOURCEMANAGER IP/Hostname where ResourceManager is running.
HADOOP_PORT_JMX_NAMENODE Namenode JMX port.
HADOOP_PORT_JMX_SECONDARYNAMENODE Secondarynamenode JMX port.
HADOOP_PORT_WEBJMX_DATANODE Datanode web UI port.
 
Default: 9864
HADOOP_PORT_JMX_RESOURCEMANAGER ResourceManager JMX port
HADOOP_PORT_WEBJMX_NAMENODE Namenode UI port .
 
Default: 9870
HADOOP_PORT_WEBJMX_RESOURCEMANAGER ResourceManager web UI port.
 
Default: 8088
PYTHON_EXECUTABLE_PATH Script that runs the Python program.
   

After checking and saving the changes, the samplers you have set in the Gateway configuration display in Active Console 2.

Set up the rules

The HadoopMonitoring-SampleRules.xml template also provides a separate sample rules that you can use to configure the Gateway Setup Editor.

Your configuration rules must be set in the Includes section.

  1. Enter the location of the file to include in the Location field. In this example, use the include/HadoopMonitoring-SampleRules.xml:
  2. The priority controls the importance of a file when merging. Sections in a higher priority file will take precedence over sections in a lower priority file. This priority setting affects the priority of configuration in the main setup file.

  3. Expand the file location in the Includes section.
  4. Select Click to load...
  5. Click Yes to load the new Hadoop include rules file.
  6. Click Rules in the Navigation panel to create new rules.
Sample Rules Description
Hadoop-NameNodeCluster-Disk-Remaining Checks the remaining disk ratio of the entire Hadoop cluster.
Hadoop-DataNode-Disk-Remaining Checks the remaining disk ration of a single datanode
HADOOP_RULE_DISK_REMAINING_THRESHOLD: Possible values 1.0 - 100.
Hadoop-Datanodes-In-Errors Checks the number of nodes with errors
HADOOP_RULE_DATANODES_ERROR_THRESHOLD: Integer values.
Hadoop-Blocks-In-Error Checks the number of blocks with error
HADOOP_RULE_BLOCKS_ERROR_THRESHOLD: Integer values.
Hadoop-Nodemanager-In-Error: Checks the number of nodemanagers with error
HADOOP_RULE_NODEMANAGER_ERROR_THRESHOLD: Integer value.
Hadoop-Applications-In-Error: Checks the number of application with error
HADOOP_RULE_APPLICATION_ERROR_THRESHOLD: Integer values.
Hadoop-SecondaryNamenode-Status : Checks the connection status of JMX plugin to Secondarynamenode service.
Hadoop-NodeManager-State: Checks the state of nodemanager
HADOOP_RULE_NODEMANAGER_UNHEALTHY:
 
Default: UNHEALTY
   

Note: The sample configuration file is verified to be working with Apache Hadoop 3.0.0, Python 2.7, and Netprobe 4.5 version.

Once the Gateway for rules configuration appears in the SampleRules Includes section, you can set the rules and alerts.