Time Over Threshold, (TOT), is configured on metric producing probes like vmware, snmpcollector, etc, but is implemented by the alarm_enrichment and nas probes. If a metric exceeds a threshold limit for longer than the specified Time Over Threshold <TOT>, within the specified sliding time window <TW>, an alarm is triggered.
How does time_over_threshold work in the UIM environment?
The alarm_enrichment (AE) probe is where the actual time-over-threshold logic resides. An alarm really isn’t an alarm until it arrives at the NAS probe. Using AE and TOT, we are holding alarms back until they meet a specific time over threshold. When a user sets the TOT parameters, the monitoring probe puts TOT rule data messages into the tot_rule_config queue and alarm_enrichment uses these messages to set up its TOT config.
A TOT rule consists of the following items:
1. Key: The key of the rule, typically follows the “met_id:et_id” format.
2. Active: a boolean value that dictates whether the rule is active or not.
3. Time: The amount of time (in seconds) that the metric must be over threshold before an alarm is fired
4. Window: The window during which the time condition must be met to fire an alarm
5. AutoClear: 0 or 1 to indicate whether there is an auto-clear timer
6. ClearTime: The time for the auto-clear timer that will close the alarm if the conditions are no longer met.
These rules are stored in the file: Nimsoft/probes/service/nas/alarm_enrichment/rule_config.xml and can be seen there in the XML format or queried via the list_tot_rules callback. Single rules can also be queried using the get_tot_rule callback. Note that in later versions of the nas, e.g., v8.56, the rules are stored in the file Nimsoft/probes/service/nas/alarm_enrichment/rule_config.json.
When there is a TOT configuration, Alarm_enrichment watches alarms arriving from the originating monitoring probe. Each alarm represents a certain amount of time over threshold based on the polling frequency. If a particular metric has a polling frequency of 1 minute and a rule of 5 minutes over threshold in a 10 minute window, AE will keep track of alarms by looking back 10 minutes in its history to see how many alarms it received. If it received >= 5 alarms (e.g. 1 x 5), then it would forward the alarm to the NAS.
If autoClear is enabled, a timer is set that expires after the clearTime and sends a close alarm to NAS. This timer is reset if the condition continues to be met with subsequent alarms.
When the monitoring probe UI is opened, the ppm probe pulls the TOT information from alarm_enrichment to populate the appropriate portions of the UI. If alarm_enrichment is disabled, the TOT configuration in the monitoring probe UI may be missing or greyed out.