We have a service that generates Alarms via Restmon. Because we don't know what these could be, we cannot close them via Restmon. We would like to be able to close them via the UI, but if we do, and then another alarm comes in, Restmon thinks it is still open, so does not send an update to OI and nobody knows the issue is still happening.
We can update the Restmon schema to specifically set status to 'new' or 'updated' for every message, but this results in OI showing a brand new alarm each time, rather than updating a matching open one. If we have a service-wide issue, this could result in 1000's of service now tickets per minute. We would like the ability to send alarms from Restmon, and then OI to see that the alarms match already open ones and update them there. It should only open a new one, if the host/alarm_unique_id doesn't match an existing open alarm.
Release : 20.2
Component : CA DOI ALARM ANALYTICS
By reducing the number of profiles running (from 19 to 8), We have stopped the issue of multiple alarms being triggered for the same event. The issue was:
- Everytime restmon restarts, it generates a new unique_alarm_id string for the same alarm, so they are not correlated.
- With the number of profiles we have, something in Restmon was restarting every 2-3 minutes, so we saw lots of alarms with different IDs.
Now an alarm is raised (with no 'state' field, or with 'new' or 'updated' state field), it will always update the same alarm in OI.
OI DEV team will be handling the following two items via enhancements.
1- Work on improving the scalability of Restmon - too much data is causing Restmon to publish alarms again.
2- OI and Restmon alarm status sync issue - alarm closure in OI is not communicated back to Restmon.