VMWare Monitoring Plug-in - Technical Reference

Introduction

VMWare delivers the world’s most trusted virtualisation and cloud infrastructure solutions that accelerate IT transformation by reducing complexity and enabling more flexible, agile service delivery.

While virtualisation has tremendous benefits, it adds new complexity when it comes to managing your network. Virtual Machines (VMs) and their host machines need performance and availability monitoring, just like their physical server counterparts. The Geneos VMWare plug-in monitors the VMWare ESXi server by querying webservice API. By collecting key parameters from the VMWare host, Geneos users are able to correlate both the host and guest health with the rich application data they collect.

Application Support teams need to have visibility of the health, performance and availability of their Virtual Machines (VMs) in order to be proactive and provide the best service to the business. When multiple layered operating systems are supporting their applications, it is just as critical to collect and analyze those metrics from guest and host to ensure the best possible performance to an end user.

The VMWare Monitoring plug-in provides Application Support teams with a view of the entire VMWare environment, drill down details to identify root cause and out of the box alerts to take actions, and fix problems on time all within Geneos solution.

The VMware monitoring plug-in has two basic types of dataviews:

  • Info View — views based on managed objects, such as the virtual machine or the host. This view shows the summary or status information.
  • Monitor View — views based on performance counters. The real-time sampling period is 20 seconds.

Technology

Geneos VMWare plug-in is a Java process that uses the VMWare vSphere API to continuously monitor a VMWare host and associated virtual machines, delivering the real-time monitoring into the Geneos framework using the XML–RPC interface.

Architecture

The VMWare plug-in integrates with your existing architecture. You can connect the plug-in to existing gateways to allow you to correlate VMWare monitoring information with monitoring from the application level that runs on this virtual infrastructure.

Prerequisites

The following are required to set up the VMWare Monitoring Solution:

  • Geneos XML-RPC API token
  • Java 1.6+
  • VMWare Solution package with dependent libs (these are included in the lib subdirectory)
  • VMWare plug-in licence: VMWareMonitor.lic

Installation

Sampler

Set up a sampler. This is set up as an API plug-in.

  1. Set the name to “Cluster”. If you wish to change the name, make sure that this value is used in the VMWareMonitor.properties file.
  2. Set the plugin type to API.
<sampler name="Cluster">
<plugin>
<api></api>
</plugin>
</sampler>

Netprobe

Select a netprobe, preferably on the machine where you will be running the plug-in code.

Managed Entity

Set up a managed entity that joins the probe and the sampler.

  1. Set the name to “VMWare”. If you wish to change the name, make sure that this value is used in the VMWareMonitor.properties file.
  2. Set Options to probe, and select the probe you set up in Netprobe.
  3. Reference the sampler you set up in Sampler.
<managedEntity name="VMWare">
<probe ref="VMWare probe"></probe>
<sampler ref="Cluster"></sampler>
</managedEntity>

VMWare Permissions

Using the vSphere client, ensure that the user has full administration permissions.

VMWare Solution with Dependent Libs

Create a directory on the server where you are running the netprobe you want to use to monitor VMWare. Copy the contents of the tar file to this location.

VMWareMonitor/
VMWareMonitor.jar
lib/
log4j-1.2.16.jar
vim25.jar
ws-commons-util-1.0.2.jar
xmlrpc-client-3.1.3.jar
xmlrpc-common-3.1.3.jar
xmlrpc-server-3.1.3.ja

Plug-in Configuration

By default, the plug-in uses a config file called VMWareMonitor.properties. If there is no config file to be found, running the plug-in the first time will generate a default config file. Confirm that the VMwareMonitor.properties file has the correct settings especially:

netprobeServer=localhost
netprobePort=7036

vSphere Connection Details

You will need to supply the URL of the webservice, which is usually https://<IPADDRESSOFSERVER>/sdk, and the username and password.

The VMware Monitoring plug-in can support only one host at a time. If you want to connect to several hosts, set up as many VMware samplers as needed. See Sampler.

Logging Configuration

The logging is configured using log4j. By default, it is configured to log to the console and a log file (VMwareMonitor.log) that will roll twice a day (AM and PM).

Initialisation

To run the VMWareMonitor.jar file:

java -jar VMwareMonitor.jar

Upgrade Instructions

You can unpack the tallball directly over an existing installation in which a new default.properties file will be created, but the plug-in will continue to use your existing config file. You may need to restart your netprobe and reset your gateway connection to clear out any obsolete views or columns.

Virtual Machine Views

For each webservice that the plug-in connects, there will be a set of Virtual Machine dataviews.

VM Info

Name Description
Boot Time

The timestamp when the virtual machine was most recently powered on.

This property is updated when the virtual machine is powered on from the poweredOff state, and is cleared when the virtual machine is powered off. This property is not updated when a virtual machine is resumed from a suspended state.

Connection State Indicates whether or not the virtual machine is available for management.
dasVmProtection

The vSphere HA protection state for a virtual machine. Property is unset if vSphere HA is not enabled.

Since vSphere API 5.0

faultToleranceState

The fault tolerance state of the virtual machine.

Since vSphere API 4.0

host The host that is responsible for running a virtual machine. This property is null if the virtual machine is not running and is not assigned to run on a particular host.
guestMemoryUsage Guest memory utilisation statistics, in MB. This is also known as active guest memory. The number can be between 0 and the configured memory size of the virtual machine. Valid while the virtual machine is running.
hostMemoryUsage Host memory utilisation statistics, in MB. This is also known as consumed host memory. This is between 0 and the configured resource limit. Valid while the virtual machine is running. This includes the overhead memory of the VM.
overallCpuDemand

Basic CPU performance statistics, in MHz. Valid while the virtual machine is running.

Since vSphere API 4.0

overallCpuUsage Basic CPU performance statistics, in MHz. Valid while the virtual machine is running.
powerState The current power state of the virtual machine.
question The current question, if any, that is blocking the virtual machine’s execution.
recordReplayState

Record / replay state of this virtual machine.

Since vSphere API 4.0

suspendInterval The total time the virtual machine has been suspended since it was initially powered on. This time excludes the current period, if the virtual machine is currently suspended. This property is updated when the virtual machine resumes, and is reset to zero when the virtual machine is powered off.
suspendTime The timestamp when the virtual machine was most recently suspended. This property is updated every time the virtual machine is suspended.
toolsInstallerMounted Flag to indicate whether or not the VMWare Tools installer is mounted as a CD-ROM.

Disk Monitor

Name Description
Name Name of the VM.
usage Aggregated disk I/O rate. For hosts, this metric includes the rates for all virtual machines running on the host during the collection interval.
Read rate

Average number of kilobytes read from the disk each second during the collection interval.

  • VM - Rate at which data is read from each virtual disk on the virtual machine.
  • Host - Rate at which data is read from each LUN on the host.

read rate = # blocksRead per second x blockSize.

Write rate

Rate at which data is written to each virtual disk on the virtual machine.

write rate = # blocksRead per second x blockSize

Commands issued Number of SCSI commands issued during the collection interval.
Commands Aborted Number of SCSI commands aborted during the collection interval.
Bus Resets Number of SCSI-bus reset commands issued during the collection interval.
physical device Read Latency Average amount of time, in milliseconds, to complete read from the physical device.
Kernel Read Latency Average amount of time, in milliseconds, spent by VMKernel processing each SCSI read command.
physical device Read Latency Average amount of time, in milliseconds, to complete read from the physical device.
Kernel Write Latency Average amount of time, in milliseconds, spent by VMKernel processing each SCSI write command.
Write Latency Average amount of time taken during the collection interval to process a SCSI write command issued by the Guest OS to the virtual machine. The sum of kernelWriteLatency and deviceWriteLatency.
Queue Write Latency Average amount time taken during the collection interval per SCSI write command in the VMKernel queue.
Highest Latency Highest latency value across all disks used by the host. Latency measures the time taken to process a SCSI command issued by the guest OS to the virtual machine. The kernel latency is the time VMkernel takes to process an IO request. The device latency is the time it takes the hardware to handle the request.
Average Read request per second

Number of disk reads during the collection interval.

  • VM - Number of times data was read from each virtual disk on the virtual machine.
  • Host - Number of times data was read from each LUN on the host.
Average Write request per second

Number of disk writes during the collection interval.

  • VM - Number of times data was written to each virtual disk on the virtual machine.
  • Host - Number of times data was written to each LUN on the host.

Memory Monitor

Name Description
Name Inventory path to Guest machine (e.g., Datacenter1/vm/myvm).
Usage

Amount of machine memory or “physical” memory, as follows:

Virtual machine - Guest “physical” memory that is mapped to machine memory. Includes shared memory amount. Does not include overhead.

active

Amount of memory that is actively used, as estimated by VMkernel based on recently touched memory pages.

Virtual machine - Amount of guest “physical” memory actively used.
shared Amount of guest “physical” memory shared with other virtual machines (through the VMkernel’s transparent page-sharing mechanism, a RAM de-duplication technique). Includes amount of zero memory area.
consumed Virtual machine: Amount of guest physical memory consumed by the virtual machine for guest memory. Consumed memory does not include overhead memory. It includes shared memory and memory that might be reserved, but not actually used. Use this metric for charge-back purposes.
Shared common

Amount of machine memory that is shared by all powered-on virtual machines and vSphere services on the host. Subtract this metric from the shared metric to gauge how much machine memory is saved due to sharing:

shared - sharedcommon = machine memory (host memory) savings (KB)

Swapped used

Current amount of guest physical memory swapped out to the virtual machine’s swap file by the VMkernel. Swapped memory stays on disk until the virtual machine needs it. This statistic refers to VMkernel swapping and not to guest OS swapping.

swapped = swapin + swapout

heap

VMkernel virtual address space dedicated to VMkernel main heap and related data.

Note: For informational purposes only, not useful for performance monitoring.

Heap free

Free address space in the VMkernel’s main heap. Varies based on number of physical devices and configuration options. There is no direct way for the user to increase or decrease this statistic.

Note: For informational purposes only, not useful for performance monitoring.

state

Amount of free machine memory on the host. VMkernel has four free-memory thresholds that affect memory reclamation:

  • 0 (high) Free memory >= 6% of machine memory minus Service Console memory.
  • 1 (soft) 4%
  • 2 (hard) 2%
  • 3 (low) 1%
  • 0 (high) and 1 (soft): Swapping is favored over ballooning.
  • 2 (hard) and 3 (low): Ballooning is favored over swapping.
overhead Amount of machine memory used by the VMkernel to run the virtual machine.
Swap target

Amount of memory available for swapping. Target size for virtual machine swap file, as calculated by the VMkernel. The VMkernel uses values for this metric with the swap metric to stop and start swapping, as follows:

  • If swaptarget > swapped, the VMkernel can start swapping when necessary.
  • If swaptarget < swapped, the VMkernel stops swapping memory.

Since swapped memory stays swapped until the virtual machine accesses it, swapped memory can be greater than the memory swap target, possibly for a prolonged period of time. This simply means that the swapped memory is not currently needed by the virtual machine and is not a cause for concern.

Swap in Total amount of data that has been read into machine memory from the swap file since the virtual machine was powered on.
Swap out Total amount of data that the VMkernel has written to the virtual machine’s swap file from machine memory. This statistic refers to VMkernel swapping and not to guest OS swapping.
Swap in Rate Rate at which memory is swapped from disk into active memory during the interval. This counter applies to virtual machines and is generally more useful than the swapin counter to determine if the virtual machine is running slow due to swapping, especially when looking at real-time statistics.
Swap out Rate Rate at which memory is being swapped from active memory to disk during the current interval. This counter applies to virtual machines and is generally more useful than the swapout counter to determine if the virtual machine is running slow due to swapping, especially when looking at real-time statistics.

DataStore Info

Name Description
name Name of the datastore.
type Type of file system volume, such as VMFS or NFS. See type.
uncommitted Total additional storage space, in bytes, potentially used by all virtual machines on this datastore. The server periodically updates this value. It can be explicitly refreshed with the RefreshDatastoreStorageInfo operation. This property is valid only if accessible is true.

Since vSphere API 4.0

url The unique locator for the datastore. This property is guaranteed to be valid only if accessible is true.
accessible The connectivity status of this datastore. If this is set to false, meaning the datastore is not accessible, this datastore’s capacity and freespace properties cannot be validated. Furthermore, if this property is set to false, some of the properties in this summary and in DatastoreInfo should not be used. Refer to the documentation for the property of your interest. For datastores accessed from multiple hosts, vCenter Server reports accessible as an aggregated value of the properties reported in MountInfo. For instance, if a datastore is accessible through a subset of hosts, then the value of accessible will be reported as true by vCenter Server, and the reason for a daastore being inaccessible from a host will be reported in inaccessibleReason.
capacity Maximum capacity of this datastore, in bytes. This value is updated periodically by the server. It can be explicitly refreshed with the Refresh operation. This property is guaranteed to be valid only if accessible is true.
freeSpace Available space of this datastore, in bytes. The server periodically updates this value. It can be explicitly refreshed with the Refresh operation. This property is guaranteed to be valid only if accessible is true.
style="font-weight: bold;">maintenanceMode

The current maintenance mode state of the datastore. The set of possible values is described in DatastoreSummaryMaintenanceModeState.

Since vSphere API 5.0

Network Monitor

Name Description
Name Name of the VM.
usage Network Usage (Average).
Packets Received Number of packets received by each vNIC (virtual network interface controller) on the virtual machine.
Packets Transmitted Number of packets transmitted by each vNIC on the virtual machine.
Data received rate The rate at which data is received across the virtual machine’s vNIC (virtual network interface controller).
Data Transmitted rate The rate at which data is transmitted across the virtual machine’s vNIC (virtual network interface controller). This represents the bandwidth of the network.
Received packets dropped Number of receive packets dropped during the collection interval.
Transmitted Packets dropped Number of transmit packets dropped during the collection interval.

CPU Monitor

Name Description
Name Name of the Guest machine.
usage

CPU usage as a percentage (in units of 1/100th of a percent) during the interval.

VM - Amount of actively used virtual CPU, as a percentage of total available CPU. This is the host’s view of the CPU usage, not the guest operating system view. It is the average CPU utilisation over all available virtual CPUs in the virtual machine. For example, if a virtual machine with one virtual CPU is running on a host that has four physical CPUs and the CPU usage is 100%, the virtual machine is using one physical CPU completely.

virtual CPU usage = usagemhz / (# of virtual CPUs x core frequency)

Usage mhz

CPU usage, as measured in megahertz, during the interval.

VM - Amount of actively used virtual CPU. This is the host’s view of the CPU usage, not the guest operating system view.

wait Total CPU time spent in wait stat.
ready Percentage of time (in units of 1/100th of a percent) that the virtual machine was ready, but could not get scheduled to run on the physical CPU. CPU ready time is dependent on the number of virtual machines on the host and their CPU loads.
used Total CPU usage.
idle Total time that the CPU spent in an idle state (meaning that a virtual machine is not runnable).
system Amount of time spent on system processes on each virtual CPU in the virtual machine. This is the host view of the CPU usage, not the guest operating system view.

VMWare Monitor Views

Admin

Host Memory Monitor

Name Description
Name The machine name of the host.
Usage

Amount of machine memory used on the host. Consumed memory includes Includes memory used by the Service Console, the VMkernel, vSphere services, plus the total consumed metrics for all running virtual machines.

host consumed memory = total host memory - free host memory

active Amount of memory that is actively used, as estimated by VMkernel based on recently touched memory pages. This is a sum of all active metrics for all powered-on virtual machines plus vSphere services (such as COS, vpxa) on the host.
shared Sum of all shared metrics for all powered-on virtual machines, plus amount for vSphere services on the host. The host’s shared memory may be larger than the amount of machine memory if memory is overcommitted (the aggregate virtual machine configured memory is much greater than machine memory). The value of this statistic reflects how effective transparent page sharing and memory over commitment are for saving machine memory.
Shared common

Amount of machine memory that is shared by all powered-on virtual machines and vSphere services on the host. Subtract this metric from the shared metric to gauge how much machine memory is saved due to sharing.

shared - sharedcommon = machine memory (host memory) savings (KB)

Swapped used

Current amount of guest physical memory swapped out to the virtual machine’s swap file by the VMkernel. Swapped memory stays on disk until the virtual machine needs it. This statistic refers to VMkernel swapping and not to guest OS swapping.

swapped = swapin + swapout

heap

VMkernel virtual address space dedicated to VMkernel main heap and related data.

Note:For informational purposes only, not useful for performance monitoring.

state

Amount of free machine memory on the host. VMkernel has four free-memory thresholds that affect memory reclamation:

  • 0 (high) Free memory >= 6% of machine memory minus Service Console memory.
  • 1 (soft) 4%
  • 2 (hard) 2%
  • 3 (low) 1%
  • 0 (high) and 1 (soft): Swapping is favored over ballooning.
  • 2 (hard) and 3 (low): Ballooning is favored over swapping.
overhead Amount of machine memory used by the VMkernel to run the virtual machine.
Swap target Amount of memory available for swapping. Target size for virtual machine swap file, as calculated by the VMkernel. The VMkernel uses values for this metric with the swap metric to stop and start swapping, as follows:
  • If swaptarget > swapped, the VMkernel can start swapping when necessary.
  • If swaptarget < swapped, the VMkernel stops swapping memory.
Since swapped memory stays swapped until the virtual machine accesses it, swapped memory can be greater than the memory swap target, possibly for a prolonged period of time. This simply means that the swapped memory is not currently needed by the virtual machine and is not a cause for concern.
Swap in Total amount of data that has been read into machine memory from the swap file since the virtual machine was powered on.
Swap out Total amount of data that the VMkernel has written to the virtual machine’s swap file from machine memory. This statistic refers to VMkernel swapping and not to guest OS swapping.
Swap in Rate Rate at which memory is swapped from disk into active memory during the interval. This counter applies to virtual machines and is generally more useful than the swapin counter to determine if the virtual machine is running slow due to swapping, especially when looking at real-time statistics.
Swap out Rate Rate at which memory is being swapped from active memory to disk during the current interval. This counter applies to virtual machines and is generally more useful than the swapout counter to determine if the virtual machine is running slow due to swapping, especially when looking at real-time statistics.

Host Disk Monitor

Name Description
Name Name of the Host.
usage Aggregated disk I/O rate. For hosts, this metric includes the rates for the host during the collection interval.
Read rate

Average number of kilobytes read from the disk each second during the collection interval.

Host - Rate at which data is read from each LUN on the host.

read rate = # blocksRead per second x blockSize

Write rate

Rate at which data is written to each virtual disk on the virtual machine.

write rate = # blocksRead per second x blockSize

Commands issued Number of SCSI commands issued during the collection interval.
Commands Aborted Number of SCSI commands aborted during the collection interval.
Bus Resets Number of SCSI-bus reset commands issued during the collection interval.
physical device Read Latency Average amount of time, in milliseconds, to complete read from the physical device.
Kernel Read Latency Average amount of time, in milliseconds, spent by VMKernel processing each SCSI read command.
physical device Read Latency Average amount of time, in milliseconds, to complete read from the physical device.
Kernel Write Latency Average amount of time, in milliseconds, spent by VMKernel processing each SCSI write command.
Write Latency Average amount of time taken during the collection interval to process a SCSI write command issued by the Guest OS to the virtual machine. The sum of kernelWriteLatency and deviceWriteLatency.
Queue Write Latency Average amount time taken during the collection interval per SCSI write command in the VMKernel queue.
Highest Latency Highest latency value across all disks used by the host. Latency measures the time taken to process a SCSI command issued by the guest OS to the virtual machine. The kernel latency is the time VMkernel takes to process an IO request. The device latency is the time it takes the hardware to handle the request.
Average Read request per second Host - Number of times data was read from each LUN on the host during the collection interval.
Average write request per second Host - Number of times data was written to each LUN on the host during the collection interval.

Host CPU Monitor

Name Description
Name Name of the Host.
usage CPU usage as a Percentage (in units of 1/100th of a percent) during the interval. Actively used CPU of the host, as a percentage of the total available CPU. Active CPU is approximately equal to the ratio of the used CPU to the available CPU.available CPU = # of physical CPUs x clock rate100% represents all CPUs on the host. For example, if a four-CPU host is running a virtual machine with two CPUs, and the usage is 50%, the host is using two CPUs completely.
Usage mhz CPU usage, as measured in megahertz, during the interval. Sum of the actively used CPU of all powered on virtual machines on a host. The maximum possible value is the frequency of the processors multiplied by the number of processors. For example, if you have a host with four 2GHz CPUs running a virtual machine that is using 4000MHz, the host is using two CPUs completely.

4000 / (4 x 2000) = 0.50

wait Total CPU time spent in wait stat.
ready Percentage (in units of 1/100th of a percent) of time that the virtual machine was ready, but could not get scheduled to run on the physical CPU. CPU ready time is dependent on the number of virtual machines on the host and their CPU loads.
used Total CPU usage.
Latency

Latency is a measure of 3 things:

  • CPU ready - Percentage of time that the virtual machine was ready, but could not get scheduled to run on the physical CPU. CPU ready time is dependent on the number of virtual machines on the host and their CPU loads.
  • CPU swap wait - Time the virtual machine is waiting for swap page-ins. CPU Swap Wait is included in CPU Wait.
  • Power settings - This is configured in the BIOS of the server. For a HP server change the power setting to “OS Control” to get the most effective balance.

Host Info

Name Description
Name Name of the Host machine.
apiType Indicates whether or not the service instance represents a standalone host. If the service instance represents a standalone host, then the physical inventory for that service instance is fixed to that single host. VirtualCenter server provides additional features over single hosts. For example, VirtualCenter offers multi-host management. Examples of values are:
  • “VirtualCenter” - For a VirtualCenter instance.
  • “HostAgent” - For host agent on an ESX Server or VMWare Server host.
apiVersion The version of the API as a dot-separated string. For example, “1.0.0”
build Build string for the server on which this call is made. For example, x.y.z-num. This string does not apply to the API.
fullName The complete product name, including the version information.
instanceUuid A globally unique identifier associated with this service instance.

Since vSphere API 4.0

licenseProductName The licence product name.

Since vSphere API 4.0

licenseProductVersion The licence product version.

Since vSphere API 4.0

localeBuild Build number for the current session’s locale. Typically, this is a small number reflecting a localisation change from the normal product build.
localeVersion Version of the message catalog for the current session’s locale.
osType

Operating system type and architecture. Examples of values are:

  • win32-x86 - For x86-based Windows systems.
  • linux-x86 - For x86-based Linux systems.
  • vmnix-x86 - For the x86 ESX Server microkernel.
productLineId The product ID is a unique identifier for a product line. Examples of values are:
  • gsx - For the VMWare Server product.
  • esx - For the ESX product.
  • embeddedEsx - For the ESXi product.
  • vpx - For the VirtualCenter product.
vendor Name of the vendor of this product.
version Dot-separated version string. For example, “1.2”.