How does NAS storm protection work?
How can I prevent alarm storms from affecting my DX UIM NAS and monitoring environment?
After a message flood affected the alarms in UIM, how can I prevent probes from sending a massive number of alarms to the NAS?
How does NAS Storm protection work:
storm_threshold) within a specified time-window (storm_timewindow) then succeeding alarms will be quarantined by re-publishing the message to the configured message Subject (storm_subject).setup > storm_messagesetup > storm_severity_level
storm_message supports variable expansion from the message header, e.g.,
Placing alarm(s) from $domain:$origin:$robot:$prid:suppkey=$supp_key, total:%d
would be represented as:
storm_severity_level
storm_severity_level = 5
This would represent changing the alarm severity to Critical.
The storm_protection value causes the key “signature” elements to be:
0. disabled
1. source, domain, robot, probe-id and supp_key
2. source, domain, robot, probe-id
3. source, domain, robot
How to enable NAS Storm Protection:
Note:
The Storm capacity determines on how many messages are retained in the transaction log and how many will be discarded.
The NAS determines that the storm has died down based on the same logic, e.g., 1000 msg/5 min, and when this condition is not true anymore then it will return to a normal state. Keep in mind these times are asymmetric. If you had a storm of 2990 alarms in the first 10 seconds then 10 more alarms occur at 4:50 seconds, the storm will be over 10 seconds after it started. This is because the arrival time of the first batch was heavily biased on the start of the storm.
That is the duration for quarantined messages to be published back to the message BUS (NimBUS). It is a sliding window.