Self-Monitoring Failure alarms keep showing up. What do they mean and how can they be prevented? For example,
Self-Monitoring Failures for '"queue name": Monitor Correlation (X of X failed), Data Collection (X of X failed). See netapp_ontap.log for more details.
- UIM 9.X and earlier
- netapp_ontap 1.21 and earlier
This Self-Monitoring failure alarm is generated when the mentioned queue is not able to fetch the data or data is getting null for that particular metric.
This is expected behavior - there are many metrics which aren't applicable, for example, if a device is found in inventory but turned off, then this error would be sent.
It simply means that out of a certain number of data points which we attempted to collect, not all of them contained data.
You should not be concerned about this unless ALL of the data points are failing. In other words, an alarm which says "6 of 682 failed" or "12 of 1054 failed" is almost always nothing to worry about. An alarm which said "682 out of 682 failed" would be a problem.
If you would like to disable these alarms, please add the following key into netapp_ontap via Raw Configure mode and navigate to the setup section: enable_self_monitoring_alarm = false
This will disable the self-monitoring alarms moving forward.