Metrics Engine Documentation Index
Configuration Reference Home

Monitoring Configuration

Monitoring Configurations provides the configuration attributes used to control how much monitoring data is retained, how frequently data is collected from monitored servers, how frequently internal metrics are recorded, and how much metric data is kept in memory (to speed up metric queries).

The Metrics Engine has a configuration object that is used to control how much data the server keeps, how frequently it gets new data from the monitored servers, and how frequently it reports it's own internal metrics.

Relations from This Component
Properties
dsconfig Usage

Properties

The properties supported by this managed object are as follows:


Basic Properties: Advanced Properties:
↓ monitored-server ↓ alert-poll-frequency
↓ second-resolution-sample-retention-duration ↓ sample-poll-max-duration
↓ minute-resolution-sample-retention-duration ↓ num-concurrent-polling-threads
↓ hour-resolution-sample-retention-duration ↓ sample-cache-idle-series-timeout
↓ day-resolution-sample-retention-duration ↓ sample-cache-prefetch-frequency
↓ sample-poll-frequency ↓ sample-cache-max-cached-series
↓ require-api-authentication ↓ sample-cache-track-statistics
↓ dbms-cluster-resolution ↓ threshold-poll-frequency
↓ omit-error-message-details
↓ slow-query-threshold-ms
↓ max-qualifiers-per-query
↓ cache-warmer-max-thread-count

Basic Properties

monitored-server

Description
Specifies the set of servers that are actively polled for samples.
Default Value
No servers are currently monitored.
Allowed Values
The DN of any LDAP External Server.
Multi-Valued
Yes
Required
No
Admin Action Required
None. Modification requires no further action

second-resolution-sample-retention-duration

Description
Length of time samples with 1-second resolution are retained. The Metrics Engine receives samples with up to 1-second resolution from the monitored servers. Over time, it will aggregate samples with 1-second resolution (or higher) to samples with 1-minute resolution. It keeps a fixed amount of history of the 1-second samples. This determines how much 1-second history is retained in the DBMS. Increasing this value means a metric query can return high-resolution data over a larger time range, but it also increases the disk space used by the DBMS.
Default Value
4 hours
Allowed Values
A duration. Lower limit is 1 hours. Upper limit is 47 hours.
Multi-Valued
No
Required
Yes
Admin Action Required
The Metrics Engine must be restarted for changes to this setting to take effect. This modification requires that you manually restart the server for the change to take effect

minute-resolution-sample-retention-duration

Description
Length of time samples with 1-minute resolution are retained. The Metrics Engine aggregates samples from one time resolution to the next higher time resolution, allowing it to keep summaries of data for an extended period of time. This determines how much 1-minutes history is retained in the DBMS. Increasing this value means a metric query can return high-resolution data over a larger time range, but it also increases the disk space used by the DBMS, and the time/disk space used by backups
Default Value
7 days
Allowed Values
A duration. Lower limit is 1 days. Upper limit is 34 days.
Multi-Valued
No
Required
Yes
Admin Action Required
The Metrics Engine must be restarted for changes to this setting to take effect. This modification requires that you manually restart the server for the change to take effect

hour-resolution-sample-retention-duration

Description
Length of time samples with 1-hour resolution are retained. The Metrics Engine aggregates samples from one time resolution to the next higher time resolution, allowing it to keep summaries of data for an extended period of time. This determines how much 1-hour history is retained in the DBMS. Increasing this value means a metric query can return high-resolution data over a larger time range, but it also increases the disk space used by the DBMS, and the time/disk space used by backup.
Default Value
52 weeks
Allowed Values
A duration. Lower limit is 4 weeks. Upper limit is 256 weeks.
Multi-Valued
No
Required
Yes
Admin Action Required
The Metrics Engine must be restarted for changes to this setting to take effect. This modification requires that you manually restart the server for the change to take effect

day-resolution-sample-retention-duration

Description
Length of time samples with 1-day resolution are retained. The Metrics Engine aggregates samples from one time resolution to the next higher time resolution, allowing it to keep summaries of data for an extended period of time. This determines how much 1-day history is retained in the DBMS. Increasing this value means a metric query can return high-resolution data over a larger time range, but it also increases the disk space used by the DBMS, and the time/disk space used by backup.
Default Value
520 weeks
Allowed Values
A duration. Lower limit is 52 weeks. Upper limit is 1040 weeks.
Multi-Valued
No
Required
Yes
Admin Action Required
The Metrics Engine must be restarted for changes to this setting to take effect. This modification requires that you manually restart the server for the change to take effect

sample-poll-frequency

Description
Length of time between polls requesting the newest samples from a monitored server. The Metrics Engine polls all monitored servers at a fixed interval to fetch new samples created on the monitored server. Monitored servers produce 'blocks' of samples at a fixed interval. Polling faster than that fixed interval provides no benefit, as the poll will not have any new data to fetch. Polling slower than that interval increases the latency between when a sample is produced by a monitored server and when it is available to a Metrics Engine client.
Default Value
30 seconds
Allowed Values
A duration. Lower limit is 10 seconds. Upper limit is 300 seconds.
Multi-Valued
No
Required
No
Admin Action Required
None. Modification requires no further action

require-api-authentication

Description
Require authentication when accessing the REST API. The Metrics Engine includes a REST API that provides access to metric definitions and samples. If the REST API is configured to require authentication, then the api-users Backend can be populated with user entries to authenticate against. API authentication is not constrained to the api-users Backend so it is possible to authenticate as a Root User as well, however api-users have no other access to the Metrics Engine so it is more secure to use these entries and not Root Users for API authentication.
The api-users LDIF Backend contains user Entries that may be used to authenticate API calls. The user entries used for authentication are intentionally minimal, and can be created via the ldapmodify utility using an entry of the following form:


dn: cn=app-user1,cn=api-users
changeType: add
objectClass: inetOrgPerson
objectClass: person
objectClass: top
cn: app-user1
uid: a1
sn: User1
userpassword: api1
ds-pwp-password-policy-dn: cn=Default Password Policy,cn=Password Policies,cn=config
Default Value
false
Allowed Values
true
false
Multi-Valued
No
Required
No
Admin Action Required
None. Modification requires no further action

dbms-cluster-resolution

Description
Sample resolutions to be clustered by DBMS. Samples are stored in DBMS tables for later queries. Queries have an access pattern that has an optimal storage layout in the DBMS tables, however the initial insert pattern does not result in the optimal storage. Because the samples are stored in time-based partitions, at a certain point in time the partition is considered 'stable' meaning it should not receive many new records. At this point the table is a candidate for a 'CLUSTER" operation which re-orders the sample in the table for optimal query performance. Query performance improvements of 20x are common for large queries across partitions that have been CLUSTERed. The drawback to the cluster technique is that the CLUSTER operation takes between 30 and 60 seconds for each partition, and during the CLUSTER operation the table is unavailable for reads or writes, effectively stalling any queries that need data from that resolution. Partitions for each resolution in the list will be CLUSTERed as soon as the partition is 'stable'.
Default Value
second
minute
hour
Allowed Values
second - Indicates that the data with 1-second resolution should be clustered as soon as a partition is no longer active. This will occur at 2 and 32 minutes past the hour every hour of the day, and will result in poor query performance during the cluster operation (typically 30-60 seconds).

minute - Indicates that the data with 1-minute resolution should be clustered as soon as a partition is no longer active. This will occur at 2 minutes past midnight and noon every day and will result in poor query performance during the cluster operation (typically 30-60 seconds).

hour - Indicates that the data with 1-hour resolution should be clustered as soon as a partition is no longer active. This will occur at 1 hour past midnight on the first day of each month and will result in poor query performance during the cluster operation (typically 30-60 seconds).
Multi-Valued
Yes
Required
No
Admin Action Required
None. Modification requires no further action


Advanced Properties

alert-poll-frequency (Advanced Property)

Description
Length of time between polls requesting the newest alerts from a monitored server. The Metrics Engine polls all monitored servers at a fixed interval to fetch new alerts created on the monitored server. Increasing this value increases the latency between when an alert is created by a monitored server and when the alert is acessible to clients of the Metrics Engine.
Default Value
30 seconds
Allowed Values
A duration. Lower limit is 10 seconds. Upper limit is 300 seconds.
Multi-Valued
No
Required
No
Admin Action Required
None. Modification requires no further action

sample-poll-max-duration (Advanced Property)

Description
Maximum time spent fetching data from a single server during on poll cycle. The Metrics Engine polls all monitored servers using a small set of threads. If one server has a lot of sample data that needs to be fetched, it may result in that polling thread being busy with a single server for a long period of time. This value provides a maximum amount of time to spend in one poll cycle on a single server.
Default Value
10 seconds
Allowed Values
A duration. Lower limit is 10 seconds. Upper limit is 300 seconds.
Multi-Valued
No
Required
No
Admin Action Required
None. Modification requires no further action

num-concurrent-polling-threads (Advanced Property)

Description
Number of threads used to poll for alerts and data from monitored servers. server. The Metrics Engine polls all monitored servers using a small set of threads. If the monitored servers are located over network links with lower throughput or higher latency (e.g. a WAN link) then more polling threads may be needed to maintain the desired polling cycle since each server poll over the slow network link may take longer to complete.
Default Value
1
Allowed Values
An integer value. Lower limit is 1. Upper limit is 20 .
Multi-Valued
No
Required
No
Admin Action Required
The Metrics Engine must be restarted for changes to this setting to take effect. This modification requires that you manually restart the server for the change to take effect

sample-cache-idle-series-timeout (Advanced Property)

Description
Length of time before cached sample series expire. The Metrics Engine stores a large volume of samples in its DBMS, typically spanning many gigabytes of disk. To achieve faster sample query times, results are stored in a cache and re-used when possible to reduce the total amount of disk i/o needed to resolve a query. Every sample query result is put into the sample cache, such that similar queries may be resolved completely from the cache and never access the DBMS. To ensure infrequent queries do not occupy too much space in the cache, cache elements have a timeout after which they will be removed from the cache.
Default Value
10 minutes
Allowed Values
A duration. Lower limit is 1 minutes. Upper limit is 60 minutes.
Multi-Valued
No
Required
No
Admin Action Required
The Metrics Engine must be restarted for changes to this setting to take effect. This modification requires that you manually restart the server for the change to take effect

sample-cache-prefetch-frequency (Advanced Property)

Description
Length of time between execution of cache prefetch queries. The Metrics Engine DBMS maintains a large number of samples spanning tens of gigibytes on disk. Some queries may require the fetching of millions of samples which may take tens of seconds. When such queries are known in advance, they can be added to the Prefetched Metric Query list, and such queries are executed periodically to ensure the results are already fully (or at least mostly) in the cache, resulting in shorter query times. This interval should not be longer than sample-cache-idle-series-timeout or it may have no effect.
Default Value
5 minutes
Allowed Values
A duration. Lower limit is 1 minutes. Upper limit is 60 minutes.
Multi-Valued
No
Required
No
Admin Action Required
None. Modification requires no further action

sample-cache-max-cached-series (Advanced Property)

Description
Number of sample data series that can be held in the cache concurrently. To improve sample query performance the results of all queries are put into a fixed-size LRU cache. This allows similar queries to complete faster since their result can mostly come directly from memory. The sample cache consumes JVM heap memory, so a larger cache (when full) will require an increase in JVM heap size. The cache is a fixed-size LRU cache, so when the number of distinct recent results exceeds the size of the cache, the oldest elements will be removed.
Default Value
50000
Allowed Values
An integer value. Lower limit is 1000. Upper limit is 500000 .
Multi-Valued
No
Required
No
Admin Action Required
The Metrics Engine must be restarted for changes to this setting to take effect. This modification requires that you manually restart the server for the change to take effect

sample-cache-track-statistics (Advanced Property)

Description
Enable sample cache statistics collection. The sample cache can produce statistics that indicate how many entries are in the cache, eviction count, cache put, miss and hit counts, time to get an entry, and expired entry count. If there is any concern about the behavior of the sample cache, collection of these statistics result in samples that can be analyzed at a later time.
Default Value
true
Allowed Values
true
false
Multi-Valued
No
Required
No
Admin Action Required
The Metrics Engine must be restarted for changes to this setting to take effect. This modification requires that you manually restart the server for the change to take effect

threshold-poll-frequency (Advanced Property)

Description
Control threshold evaluation period. The Metrics Engine can create a set of Threshold objects and feed the raw metric sample input stream to these objects, allowing the objects to track the current and recent average values of different metrics. These Threshold objects are periodically polled to determine if the most recent data exceeded configured limits. If the limits were exceeded the Threshold enters an alerted state and additional processing may occur. The Metrics Engine performs Thresholding by default, and this property allows the polling period to be controlled. A value of zero means thresholding is disabled.
Default Value
30 s
Allowed Values
A duration. Lower limit is 0 seconds. Upper limit is 300 seconds.
Multi-Valued
No
Required
No
Admin Action Required
The Metrics Engine must be restarted for changes to this setting to take effect. This modification requires that you manually restart the server for the change to take effect

omit-error-message-details (Advanced Property)

Description
Specifies that API error messages for invalid queries, unknown resources, service unavailable, and internal server errors are generic in nature. Detailed errors messages can be helpful in diagnosing application errors, but in production they may reveal information that might be useful to a malicious attacker.

Though enabling this property may make the data more secure, doing so may result in lessening the user experience of client applications that rely on presenting detailed error messages to users. You should enable this property only after careful consideration and only when there is no ongoing development on applications that leverage the API.

Default Value
false
Allowed Values
true
false
Multi-Valued
No
Required
No
Admin Action Required
None. Modification requires no further action

slow-query-threshold-ms (Advanced Property)

Description
Enable logging of slow metric query details. If set, this property controls which metric queries are logged to the error log. The property is of the form "max-time[:metric-name]". If defined, all queries that take longer than max-time milliseconds will be logged. If metric-name is defined, only queries for that metric that take longer than max-time will be logged. Set this attribute to diagnose slow metric query performance problems.
Default Value
None
Allowed Values
Limit in milliseconds above which a metric query should be logged as slow, optionally followed by a metric-name that will restrict logged queries to only the specified metric.
Multi-Valued
No
Required
No
Admin Action Required
None. Modification requires no further action

max-qualifiers-per-query (Advanced Property)

Description
Maximum number of metric qualifiers allowed per query. Some metric queries may consist of many metric qualifiers, which are realized as an SQL 'where X in ()' clause. Requesting too many elements in this clause has memory and query performance implications. This attribute allows us to tune that upper limit.
Default Value
500
Allowed Values
An integer value. Lower limit is 100. Upper limit is 5000 .
Multi-Valued
No
Required
No
Admin Action Required
None. Modification requires no further action

cache-warmer-max-thread-count (Advanced Property)

Description
Number of threads used by cache warmer. The cache warmer executes a set of queries to keep the data in the server cache up to date. The optimal number of threads depends on the available CPU, disk and memory resources.
Default Value
2
Allowed Values
An integer value. Lower limit is 1. Upper limit is 20 .
Multi-Valued
No
Required
No
Admin Action Required
None. Modification requires no further action


dsconfig Usage

To view the Monitoring Configuration configuration:

dsconfig get-monitoring-configuration-prop
     [--tab-delimited]
     [--script-friendly]
     [--property {propertyName}] ...

To update the Monitoring Configuration configuration:

dsconfig set-monitoring-configuration-prop
     (--set|--add|--remove) {propertyName}:{propertyValue}
     [(--set|--add|--remove) {propertyName}:{propertyValue}] ...