After upgrade to Aria Operations 8.16.x, "outbound alert queue is full" alerts show up in Aria Operations UI.
Aria Operations 8.16x
Non-HA cluster.
The issue can occur for several reasons, including alert storm which are based on:
1. Outdated events/alarms received from vCenter server
2. Events triggered from vCenter such as vSAN|VM Storage Policies|Compliance changes
Aria Operations analytics.log shows repeated errors:
2024-02-27T06:13:01,474+0000 ERROR [Threshold checker worker thread 12] com.integrien.analytics.plugins.alertplugins.AlertTransmissionWrapper.pushAlert - Outbound notification queue is full, discard notification for alert xxxxxx-xxxxxx-xxxxxx-xxxxx-xxxxx-xxxxx
2024-02-27T06:13:01,474+0000 ERROR [Threshold checker worker thread 10] com.integrien.analytics.plugins.alertplugins.AlertTransmissionWrapper.pushAlert - Outbound notification queue is full, discard notification for alert xxxxxx-xxxxxx-xxxxxx-xxxxx-xxxxx-xxxxx
This issue is resolved in Aria Operations 8.17.1
Workaround:
Modify the alert "wait cycle" to 3 (if you need to stay on 8.16x) setting the wait cycle to 3 should decrease the amount of alerts considerably.
When alerts are being triggered (and put into alert-queue) at higher tempo than consumed (emptied) by notification engine, sometimes alertQueueSize limit may be reached, causing new alerts to be ignored by notification engine.
Such dropped alerts won't be relayed to external world. Message "Outbound alert queue is full" message (in analytics log) indicates this condition