Operating Environment

Overview

The Gateway Operating environment top-level section contains settings that affect the Gateway as a whole, and do not belong to any other section. Almost all of the settings in this section are optional.

Operation

The only mandatory setting in this section is operatingEnvironment > gatewayName. This name is displayed to all users that connect to the Gateway, and is also used in name lookup and database logging. It is strongly recommended that this name be unique for each Gateway on a particular site.

The Gateway listen ports can also be set in the operating environment (operatingEnvironment > listenPorts). The listen ports are used by components connecting to gateway (such as ActiveConsole 2 or WebSlinger) to request monitoring data for display to users.

Note: This does not include Netprobe connections, as configuration for these are contained within Probes. If not configured, the gateway listen port defaults to port 7039 for insecure channel and 7038 for secure channel.

Data quality options

Settings in Operating environment control the data quality algorithm that Gateways use to maintain a consistent level of service under excessive load. This algorithm runs throughout the lifetime of a gateway (unless operatingEnvironment > dataQueues > disableChecks is set) and operates as follows:

Probes suspension may additionally be controlled by the suspend probe and unsuspend probe commands. See the commands appendix for details.

For more information regarding Data quality, see Data Quality User Guide.

Memory protection

Settings for memory protection are found in the Data Quality tab.

When data quality is disabled, or in extreme situations when it cannot suspend sufficient probes to prevent the gateway becoming overloaded, the gateway throttles the reading of TCP data to prevent the backlogged data-queues from unbounded growth. This is necessary but less preferable to a managed data-quality suspension because if it continues without recovery, netprobe connections either flow-control or timeout and netprobes are dropped at random.

There are two threshold levels:

When the low-priority threshold is breached, the gateway throttles reads from all importing (netprobe) connections but remain responsive to downstream components, such as Active Console. In the unlikely event that this fails to prevent memory growth and the high-priority threshold is breached, the gateway throttles reads from all connections and become unresponsive until it recovers.

The default for the low-priority threshold is 250 MB. This is calculated to be enough to buffer 77 seconds (see operatingEnvironment > dataQueues > maxDataAgeMs) worth of data on a high bandwidth gateway (approximately 30,000 cell updates per second). This is in order to give the data-quality algorithm time to step in and save the situation before the threshold is reached.

Note: These thresholds govern memory usage by unprocessed EMF messages only. The gateway memory footprint as a whole is typically far more influenced by other factors and could potentially exceed these thresholds without issue. It is unusual for unprocessed EMF messages to account for more than a few megabytes in a normally operating gateway.

Conflation

Settings for conflation are found in the Data Quality tab.

Conflation is an optional and less drastic method of coping with an overloaded gateway than a data quality suspension. When the data queues (containing incoming sampler updates from netprobes) become backlogged due to the gateway being unable to process them as fast as they arrive, conflation allows the gateway to discard out-of-date cell updates and only process and publish the latest cell values. As this could potentially result in the gateway discarding important updates, or missing short-lived events, conflation is disabled by default and should only be used with care.

Conflation examples

Rapid cell updates

When a Netprobe has published several updates to the same cell before the gateway has processed the first update:

  • Update cell from 1 to 2
  • Update cell from 2 to 3
  • Update cell from 3 to 4
  • Update cell from 4 to 5

With conflation active, gateway only publishes the latest value:

  • Update cell from 1 to 5
Updates to a recently created row

When a netprobe updates values in a recently created row before the gateway has processed the create:

  • Create row newRow with three cells: 100,200,300.
  • Update first cell in newRow from 100 to 111.
  • Update second cell in newRow from 200 to 222.

With conflation active, gateway adds the row to the dataview with the latest values:

  • Create row newRow with three cells: 111,222,300.
Updates to a row that is then removed

When a netprobe updates values in a row and then removes the row before the gateway has processed the updates:

  • Update first cell in row1 from 100 to 111.
  • Update second cell in row1 from 200 to 222.
  • Remove row row1.

With conflation active, gateway discards the updates and only process the row-removal:

  • Remove row row1.
Short lived rows

When a netprobe creates a row and then removes it again before the gateway has processed the create:

  • Create row newRow with four cells: 100,200,300,400.
  • Update first cell in newRow from 100 to 111.
  • Remove row newRow.

With conflation active, gateway conflates away the update as normal but does not conflate away the entire row:

  • Create row newRow with four cells: 111,200,300,400.
  • Remove row newRow.

Potential Issues

Conflation can prevent a gateway from becoming overloaded, and ensure that published values are always up-to-date, but there are a number of potential issues which you should be aware of.

Lost Spikes

A dataview cell that updates from 32% to 34% to 33% is unlikely to cause issues by having the intermediate update conflated away, but one that updates from 32% to 99% to 33% may miss an important spike.

Similarly, a cell that goes from OK to ERROR to OK again, could cause an alert to be missed if conflation is enabled.

This might also affect compute-engine rules that use statistical functions such as maximum or minimum.

Rate Function

Rate function triggers off the time an update is processed, rather than the time it is generated, and therefore its general performance is likely be improved by conflation.

Note: Spikes in the rate-of-change in a cell may be conflated away.

Database Logging

Cell updates that are discarded by conflation will not be logged.

sampleTime and logNetprobeSampleTimeForDataItems

If logNetprobeSampleTimeForDataItems is configured, cell updates may be logged with sample-times later than they were produced with. This is because the sample-time is published by the netprobe along with the sample-data and will be conflated to the latest value.

E.g. A series of updates produced at twenty second intervals by the netprobe:

  • Update cell1 @ 09:25:02
  • Update cell2 @ 09:25:22
  • Update cell3 @ 09:25:42

Might be conflated into a single update with the latest sample-time:

  • Update cell1, cell2, and cell3 @ 09:25:42

The updates to cell1 and cell2 may be logged with this later sample-time. Similarly, rules that reference the sample-time may only see the later value.

Configuration

Basic tab

These settings are found under the Basic tab.

operatingEnvironment

Gateway-wide options are configured here.

operatingEnvironment > gatewayName

A short name identifying the Gateway.

When using database logging functionality, this name is also logged to the database and used to identify records for this Gateway.

Mandatory: Yes

operatingEnvironment > licensingGroup

Group that the Gateway requests licences from on the Licence Daemon.

Mandatory: No

operatingEnvironment > listenPorts

The gateway listen ports for incoming connections.

See Secure Communications for more details.

The listen port can also be specified as a command-line argument to gateway. If this is done, then the command-line value is used for the lifetime of the gateway process - it cannot be overridden or altered by editing the gateway setup file. An example of using this command-line option is shown below:

gateway2 -port <12345>
Mandatory: No
Default: The gateway listens insecurely on port 7039

operatingEnvironment > listenPorts > secure

This specifies that the gateway should listen securely. In order to listen securely, a SSL certificate needs to be provided using either the -ssl-certificate or -ssl-certificate-key command line option. By default if configured to be secure, the gateway will listen on port 7038. This can be overriden by using the child setting operatingEnvironment > listenPorts > secure > listenPort.

Mandatory: No
Default: The gateway will not listen securely on any port

operatingEnvironment > listenPorts > secure > listenPort

This value overrides the default secure listenPort. Specify an integer in the range 1-65535.

Mandatory: No
Default: 7038

operatingEnvironment > listenPorts > insecure

This specifies that the gateway should insecurely. By default if configured to allow insecure connections, the gateway will listen on port 7039. This can be overriden by using the child setting operatingEnvironment > listenPorts > insecure > listenPort.

Mandatory: No

operatingEnvironment > listenPorts > insecure > listenPort

This value overrides the default insecure listenPort. Specify an integer in the range 1-65535.

Mandatory: No
Default: 7039

operatingEnvironment > var

List of user environment variable definitions. See User Variables and Environments for details on how to configure environment variables.

Advanced tab

These settings are found under the Advanced tab.

operatingEnvironment > description

An optional description of the gateway.

Mandatory: No

operatingEnvironment > maxLogFileSizeMb

Maximum size in Megabytes of the Gateway log file before it rolls that log file over.

Valid values are 1-2047 inclusive for 32-bit Gateways.

Mandatory: No
Default: 10

operatingEnvironment > logArchiveScript

The name of a batch file or shell script that should be executed when the log file is rolled over.

Note: Using operatingEnvironment > logArchiveScript overrides LOG_ARCHIVE_SCRIPT (if set).

Mandatory: No

operatingEnvironment > timezone

The time zone the Gateway rund in. This allows a Gateway in one country to monitor Netprobes in another country whilst keeping the time zones the same.

The time zone is specified in the format:

std[+/-offset]

Where std represents one of the standard time zones. Available valid time zones can be found by examining the system time zone database, found in:

  • /usr/share/lib/zoneinfo on Solaris.
  • /usr/share/zoneinfo on Linux.

If you specify an offset explicitly, it overrides the definition in the system time database, including the rules to automatically adjust for daylight savings time. It is interpreted as the number of hours necessary to add or subtract to get Coordinated Universal Time (UTC).

For example, if you want your Gateway to run in US Eastern Standard Time, you can:

  • (Recommended) Specify the timezone as America/New_York.
  • Specify the timezone as EST.
  • (Not recommended) Specify the timezone as EST+5EDT,M3.2.0/2,M11.1.0/2 (see POSIX documentation for the TZ variable).
  • (Not recommended) Specify the timezone as EST+5 when DST is not in effect and change it to EST+6 when DST is in effect.

The Gateway attempts to validate your choice of timezone against the zoneinfo directories, and issues a warning if the timezone cannot be verified.

Mandatory: No
Default: Gateway uses the value of the TZ environment variable. If TZ is not set, Gateway uses the local time of its host machine.

operatingEnvironment > timezoneabbreviation

A list of time zone abbreviations and their default timezone regions. This is used to override the timezone abbreviation interpretations in Rules timezone parsing/printing.

operatingEnvironment > internalQueueSizeLimit

Controls the maximum length of the internal update queue. Updates to data-items (e.g. a severity change as the result of running a rule) are placed in the queue temporarily between data updates.

The default maximum limit should be adequate for normal gateway operation. If a pair (or more) of rules are configured such that an update caused by rule A makes rule B fire and update, then this can cause the internal queue to fill faster than it is processed. If the queue is completely filled an error message is logged, and gateway performance is likely to be affected. The solution is to write rules A and B to be more selective, so that they do not fire each other.

Certain compute engine rules (typically involving wildcarded paths) can also fill the processing queue during gateway startup. The queue limit can be increased to prevent warning messages if required, however this should only be done if it is known that this situation is a "one off".

Mandatory: No
Default: 4000

operatingEnvironment > numRuleEvaluationThreads

Specifies the maximum number of rule evaluation threads the gateway can run. These threads are used to execute rules on data changes, and can be enabled if rule execution is becoming a bottleneck on a busy gateway.

It is recommended that this is not set too high as doubling the number of threads does not double throughput. It should not be set higher than the number of CPU cores available.

This setting specifies a maximum number of threads only. The actual number of threads used is the minimum of this value and the number of available processors (as specified by taskset on Linux or psrset on Solaris). The number of rule threads used is recorded in the gateway log. If the available number of processors is changed while the gateway is running then the number of threads to use is re-evaluated at the next setup change.

For more detailed information about the optimum value to use, see Gateway Performance Tuning.

A hard limit can also be placed on the number of rule threads by setting the environment variable MAX_RULE_THREADS to a positive integer.

Mandatory: No
Default: 0 (no threads used)

operatingEnvironment > historyFiles

The maximum number of history files that the gateway is allowed to create when receiving set-up changes.

Valid values are 0 -9999 inclusive.

To suppress history file creation altogether, set this to zero.

Mandatory: No
Default: 10

operatingEnvironment > dataDirectory

Allows you to specify where temporary files which the gateway may produce while running should be stored.

If not set, files are stored in the current working directory. If the directory specified already contains any of these temporary files, they are over-written.

The data directory must have read, write, and execute permissions as it needs to be able to read, write and search within it.

Mandatory: No
Default: Current working directory

operatingEnvironment > duplicateRowAlerts

When duplicate rows in a single dataview are detected, gateway alerts the user of this fact as it indicates a configuration error. These alerts can be adjusted using this setting.

Value Effect
NONE No alerts regarding duplicate rows are produced.
STATUS The dataview samplingStatus headline is updated with an error message warning about duplicate rows.
TICKER_AND_STATUS The samplerStatus headline is updated as above, and additionally an event ticker event is created regarding the duplicate rows.
Mandatory: No
Default: TICKER_AND_STATUS

operatingEnvironment > insecurePasswordLevel

In a number of places throughout the Gateway configuration, passwords have to be specified.

Examples of this might be plugins that require logins to systems to retrieve the data they need, the gateway's connection to the database, or the configuration of users. In most of these places it is possible to enter the password in a number of different formats (depending on context), from a cleartext format, to more secure formats such as AES (for two way), and crypt (for one way).

While it may be useful to use a cleartext format in a UAT or testing environment, you may prefer to ensure that a secure format is used when in a production environment. This setting helps locate these, by causing each insecure password to generate an issue at the specified level. This is shown when validating or saving the setup.

Note: We have deemed standard encoded passwords (std) to be insecure since they are encoded rather than encrypted, and these will be flagged in the same way as cleartext passwords.

The setting has the following effects:

  • None — no checks are performed on the security of the passwords and no issues are reported.
  • Critical — the setup cannot be saved and the Gateway cannot be started with any insecure passwords present.

With Warning or Error set, the ability to save the setup with insecure passwords present depends on if the -max-severity command line parameter is set. See the Gateway Installation Guide.

The Gateway Data Plug-In reports the level of this setting.

Mandatory: No
Default: None

operatingEnvironment > allowComputeEngine

Specifies whether the Gateway compute engine feature is available to add additional data to existing dataviews.

It is allowed by default, but administrators and users can use this setting to disallow compute engine features. See Compute Engine.

Mandatory: No
Default: true

operatingEnvironment > writeStatsToFile

The "write stats to file" section contains settings controlling how load monitoring statistics are written out from the gateway.

These statistics can then be read by the Load Monitoring plugin. Also see the Gateway Performance Tuning for more information.

Mandatory: No
Default: No statistics written.

operatingEnvironment > writeStatsToFile > filename

The file to write statistics to.

Mandatory: Yes

operatingEnvironment > writeStatsToFile > enablePeriodicWrite

Specifies whether to write data to file periodically.

If false, statistics are only written when the "write statistics" command is executed.

Mandatory: No
Default: true

operatingEnvironment > writeStatsToFile > writeInterval

Interval in seconds at which statistics are written to file if periodic writes are enabled.

Mandatory: Yes
Default: 20

Connections tab

These settings are found under the Connections tab.

operatingEnvironment > heartbeatInterval

Number of seconds before a Gateway sends a heartbeat message to a connected component if it does not receive any communication from the component.

Gateway expects a reply within the number of seconds specified by the connectWait setting. If the reply is not received within this time, the connection is terminated and re-established.

The valid range for the heartbeat interval is 20-300 seconds inclusive.

Mandatory: No
Default: 75 (seconds)

operatingEnvironment > connectWait

Time in seconds to wait for a connection to Netprobe to be established. That is, the maximum duration the gateway waits after sending the initial TCP SYN segment for a SYN/ACK reply from the Netprobe.

The valid range is 1-300 seconds inclusive.

Mandatory: No
Default: 30 (seconds)

operatingEnvironment > dnsCacheExpiryTime

Time in minutes that the Gateway caches the result of resolving a hostname to an IP address.

Valid values are 0-2880 inclusive.

If set to 0, hostnames are cached indefinitely.

Mandatory: No
Default: 720 (12 hours)

operatingEnvironment > clientConnectionRequirements > minimumComponentVersion > minimumForAllComponents

This instructs the Gateway to reject connections from every component with versions older than the specified version.

You can specify the minimum version using the:

  • Version number. For example, GA4.7.0, or GA2011.2.1.
  • Version number with the build date. For example, GA4.7.0-180529.
Mandatory: No

operatingEnvironment > clientConnectionRequirements > minimumComponentVersion > components > component > name

Name of a Geneos component type. The drop-down list has the following options:

  • Active Console
  • Gateway
  • Licence Daemon
  • Netprobe
  • Web Dashboard
  • Webslinger
Mandatory: No

operatingEnvironment > clientConnectionRequirements > minimumComponentVersion > components > component > version

The minimum version of the component selected in operatingEnvironment > clientConnectionRequirements > minimumComponentVersion > components > component > name that the Gateway accepts connections from.

You can specify the minimum version using the:

  • Version number. For example, GA4.7.0, or GA2011.2.1.
  • Version number with the build date. For example, GA4.7.0-180529.
Mandatory: No

operatingEnvironment > clientConnectionRequirements > requireCertificates

This allows the gateway to require certificates when connections are made to the gateway for certain connection types. This can be enabled/disabled for each supported connection type. The following connection types are supported:

  • Netprobe: Incoming connections from netprobes (this will include Floating Netprobes and Self-announcing Netprobes).
  • Importing Gateways: Incoming Importing Gateway connections.
  • Importing Gateways: Incoming Gateway connections from Importing Gateways (to which this gateway exports data).
  • Secondary Gateways: Incoming Gateway connections from the secondary Gateway in a Hot Standby configuration.
Mandatory: No
Default: Certificates are not required for any incoming connections.

operatingEnvironment > httpConnectionRequirements

This group of settings allow HTTP requests made to the Gateway (e.g. from a web browser) to be restricted.

operatingEnvironment > httpConnectionRequirements > internalData

The internal data web pages provide low level information about various parts of the system.

They may be requested by ITRS support when debugging issues. They do not form part of the normal operation of the Gateway so can safely be restricted to Geneos administrators.

The internal data pages available, and the information available on each page, can vary by version. However, the connection requirements cover all internal data pages, so will secure any pages added in future versions.

operatingEnvironment > httpConnectionRequirements > internalData > acceptHosts

This allows the internal data web pages, used for debugging issues, to be viewed only from particular locations. The available settings are:

  • All — allow access from any host.
  • Local — allow access only from the local loopback interface where the Gateway is running (127.0.0.1).
  • None — prevent access completely.
  • Specific — a list of locations may be entered. Each item in the list can be specified as a hostname (if a reverse DNS entry is available for the remote host) or as an IP address. The source of any HTTP requests must match at least one item in the list otherwise they are rejected. If no items are specified, access is prevented completely.

The remote hostname and IP address are written to the Gateway log file, along with the URL requested, for any attempts that are blocked. This can be useful to see if the Gateway host is able to access a reverse DNS entry for the remote host and therefore what would need to be added to the 'specific' list for the request to be accepted. If a hostname is not available then the IP address is seen instead of the name in the log file, so will appear twice.

Mandatory: No
Default: local

operatingEnvironment > DNS > maxAcceptableDNSLookupTime

The maximum time in seconds that the Gateway is allowed to perform a reverse DNS lookup.

If this time is exceeded, reverse DNS lookups are disabled for the IP address for the number of units of time specified in operatingEnvironment > DNS > DNSReverseLookupDisableTime > value.

For non-Gateway components, this setting defaults to the time specified in the environment variable $HR_TIMEOUT.

Mandatory: No
Default: 1

operatingEnvironment > DNS > DNSReverseLookupDisableTime > value

Number of units of time that reverse DNS lookups are disabled for after exceeding operatingEnvironment > DNS > maxAcceptableDNSLookupTime.

The unit of time is specified using operatingEnvironment > DNS > DNSReverseLookupDisableTime > units

Once the time has elapsed, reverse DNS lookups are re-enabled for the IP address.

For non-Gateway components, DNSReverseLookupDisableTime can be specified using the environment variable $HR_REVERSE_LOOKUP_DISABLE_TIME.

Mandatory: No
Default: 5

operatingEnvironment > DNS > DNSReverseLookupDisableTime > units

The unit of time used to determine how long DNS lookups are disabled for after exceeding operatingEnvironment > DNS > maxAcceptableDNSLookupTime.

There are two options:

  • minutes
  • seconds
Mandatory: No
Default: minutes

Debug tab

These settings are found under the Debug tab.

operatingEnvironment > debug

A list of gateway debug settings. These settings are only intended for debugging error conditions and should be enabled with care.

Caution: Use of these setting is likely to adversely impact the performance of the Gateway and should only be enabled when debugging a particular configuration and in coordination with ITRS support staff.

Data Quality tab

These settings are found under the Data Quality tab.

operatingEnvironment > dataQueues > disableChecks

Enable this setting to disable the data quality checking algorithm.

Mandatory: No
Default: false (algorithm is run)

operatingEnvironment > dataQueues > maxDataAgeMs

Time in milliseconds of the maximum acceptable age for a pending update to a dataview. The limit is inclusive, so an update must be older than the set value to cause a connection to be suspended.

For more details on how this setting is used, see Data quality options.

Note: The default value is set to approximate the behaviour of Gateway versions prior to the introduction of the data quality feature.

Mandatory: No
Default: 77000 (77 seconds)

operatingEnvironment > dataQueues > connectionSuspensionDuration

Time in seconds that a connection (to a Netprobe) is suspended before the gateway reconnects.

For more details on how this setting is used, see Data quality options.

Mandatory: No
Default: 300 (5 minutes)

operatingEnvironment > dataQueues > suspendGracePeriod

Time in seconds specifying how long the gateway waits after suspending a connection before allowing further connections to be suspended.

For more details on how this setting is used, see Data quality options.

Mandatory: No
Default: 60 (1 minute)

operatingEnvironment > dataQueues > setupGracePeriod

Time in seconds specifying how long the gateway suspends the data quality algorithm for after a setup change.

For more details on how this setting is used, see Data quality options.

Mandatory: No
Default: 60 (1 minute)

operatingEnvironment > dataQueues > memoryProtection

Allows overriding the default data-queue memory protection settings.

For more details on how this setting is used, see Memory protection.

Mandatory: No

operatingEnvironment > dataQueues > memoryProtection > lowPriorityThresholdMB

Threshold size in MB for backlogged EMF messages at which the gateway throttles read-data from low-priority connections.

Low priority connections are importing EMF connections (Netprobe connections only). All other gateway connections continue to operate normally.

For more details on how this setting is used, see Memory protection.

Mandatory: Yes
Default: 500

operatingEnvironment > dataQueues > memoryProtection > highPriorityThresholdMB

Threshold size in MB for backlogged EMF messages at which the gateway throttles read-data from all connections.

In practice it is very unlikely even for a heavily overloaded gateway to hit this threshold, as the low-priority threshold is hit first.

For more details on how this setting is used, see Memory protection.

Mandatory: Yes
Default: 750

operatingEnvironment > conflation

Settings to control conflation of incoming monitoring data.

For more details on how this setting is used, see Conflation .

Mandatory: No

operatingEnvironment > conflation > enabled

Whether conflation is enabled.

Conflation can significantly aid an overloaded gateway and ensure that all published data is as up-to-date as possible. However, it does this by discarding out-of-date cell updates and should not be enabled if this is unacceptable.

For more details on how this setting is used, see Conflation .

Mandatory: Yes

operatingEnvironment > conflation > strategy

Specify the strategy for controlling gateway conflation, so that conflation is only enabled when required and no updates are unnecessarily discarded.

Mandatory: Yes

operatingEnvironment > conflation > strategy > maxDataAgeThreshold

Under this strategy the gateway does not enable conflation unless the maximum age of backlogged updates (as displayed by the Probe Data plugin ) exceeds a certain threshold.

Conflation works best when it is preventing stale data from building up rather than clearing large backlogs (not only does it have fewer backlogged messages to process, but it minimises the amount of updates conflated away) and the threshold should not be set too high.

An ideal value for the threshold is the minimum samplers > sampler > sampleInterval used in the setup. For this reason it defaults to the default sampleInterval of twenty seconds.

Mandatory: No

operatingEnvironment > conflation > strategy > maxDataAgeThreshold > threshold

Time in milliseconds for the threshold maxDataAge above which conflation is enabled. An ideal value for this setting is the minimum samplers > sampler > sampleInterval used in the setup.

Mandatory: No
Default: 20000

operatingEnvironment > conflation > strategy > alwaysOn

Under this strategy the gateway permanently enables conflation.

This strategy always results in the most up-to-date data being published, but may cause updates to be discarded when it is not strictly necessary.

Mandatory: No