I have alarms that the status is marked as unknown (question mark), why does this happen if I do have metrics into the alarm?
Release : SAAS
Component :
All alarms are made by a metric group. When you assign a metric group to an alarm, all metrics inside it should be live and reporting, otherwise, it will be marked as unknown. EG:
- This is an alarm setup that is marked as unknown:
- See that the Metric group is "Kubernetes - API Server Connection". You have to make sure what metrics are inside it, so you can go over your metric group or check into metric view by putting the agent expression into the search box:
- The key point here is to understand what metrics from your metric group are repotting live metrics, If one of those is not online, the status will be unknown. A good way to try to understand what is online and what is not, is to change the historical view directly from your alarms screen:
Now select 3 months:
As you can see, this metric group has 4 metrics, only 2 have live metrics bring reported, that is why this alarm is showing as unknown. To fix it, adjust the metric group by removing the non working metrics or check the agents logs to understand why this metric is not live.
APM evaluates alerts and calculator just before storing data to NASS. That is why the agent metrics has to go through APM, it some 3rd party product writes metrics directly to NASS there is no surprise that the alert is not evaluated.
Using "SuperDomain|" prefix in the non-APM component is wrong. Only APM can do it. We have the metric prefix to distinguish APM metrics, similar prefix is in OI etc. Writing metrics with same prefix as APM can result in unpredictable errors as well.
If you want APM to work with the metric, e.g. viewing it in APM or having APM alerts sends the data via APM. APM will check capacity during connection etc.