Please refer to the “alert definition” screen shot:
My interpretation of this alert configuration is:
Resolution value of 6 minutes is the length of 1 period; therefore, a caution alert will be issued if the caution threshold has been below 30% for 30 consecutive minutes within a 90-minute rolling window.
Is that understanding of this alert definition correct? If this understanding is not correct, please share how we should interpret the alert.
If the understanding is correct, please see the screenshot, “recorded metrics”.
We received a Caution alert at 7:36am:
2021-07-23T13:36:00+0000
Message: The alert Capacity Kubernetes CPU Request Deviation - Namespaces has breached the MAJOR threshold of 30
Metric Name: SuperDomain|HOSTNAME|ClusterDeployment|Infrastructure Agent|Kubernetes|<example>:CPU Request Deviation
Metric Value: 8
Severity: major
Alarm type: Application
Host: HOSTNAME
Alarm ID: <alarmid>
Product: Application Performance Management
Product version: 21.6.0.26(Build 990026)
The preceding 30 minutes show the metric value moving above and below the threshold. We do not understand why an alert would be issued if our understanding is correct as explained about.
Release : DX APM SAAS
An alert will be issued if the caution threshold has been below 30% for any 5 periods below threshold during the last 15 periods (6 minutes is the length of 1 period).
If the purpose is to only get an alert if there is consecutive threshold breach past 30 minutes, then we need to set the following.
"Periods Over Threshold" 5 and "Observed Periods" 5.
Please review the following information.
Review under Configure Simple Alert Settings
Step 6 explains the Periods Over Threshold and Observed Periods fields.