Multiple alarms are being received for the same storage devices
search cancel

Multiple alarms are being received for the same storage devices

book

Article ID: 438303

calendar_today

Updated On:

Products

DX Unified Infrastructure Management (Nimsoft / UIM) CA Unified Infrastructure Management On-Premise (Nimsoft / UIM) CA Unified Infrastructure Management SaaS (Nimsoft / UIM)

Issue/Introduction

The device generated multiple alarms despite being continuously down for more than 2 hours.

We noted that these alarms cleared even though the established threshold was violated.

Environment

  • icmp probe
  • icmp_mcs_templates
  • mon_config_service
  • MCS Alarm policy
  • DX UIM 23.4 CU4

Cause

  • randomness, unpredictability, flip-flopping, start-and-stop latency, just like jitter, and packet loss.

Resolution

  • The best approach here is to extend the TOT and sliding window due to the high frequency of network monitoring like packet loss with a smaller time period - its just the nature of the beast and its frequency, just like latency, jitter, and packet loss. What if the issue flip-flops, starts and stops, comes and goes? It could be misleading if it were intermittent and the alarms were being auto-cleared and therefore regenerated.

  • Currently as configured, the MCS alarm policy for packet loss alarms is set to 3 min out 5 min for ToT -  that is almost equivalent to immediate and with auto-clear enabled, its regenerating the alarms with new alarm ids in many cases. The purpose of ToT itself is to reduce the unwanted alarms and noise and only generate the alarm when an actual problem persists. 

  • The total amount of time packet loss must be above the threshold within the sliding window to trigger an alert (e.g., 20 minutes), out of a sliding window of 30. Auto-clear option should remain enabled. Suggest gathering some data to make an informed decision by checking the latency behaviour over time using tracert/traceroute, or a ping script (ping -a <ip>).

  • Try 14 out of 15 min or 19 out of 20 min or 20 min out of 30 min-for the time frame but at least 3 monitoring intervals should be used as the environment is fluctuating or there are obviously some underlying network issues. The settings must be determined by data collection, assessment and decide upon monitoring requirements by the network admin/end user.