All alert configurations, default or custom, stop working which affects any configured notifications such as syslog or email.
Under Operations > Alerts > All Alerts will be empty.
In the alert notification logs from the leader controller node, there will find events sent from events manager for alerts suddenly stop, the logs will have an old timestamp. In this case, the events stopped on 04.14.2025, the logs were collected 06.24.2025.
File: /var/lib/avi/log/alert_notifications_debug*log*
/var/lib/avi/log]
└─$ zgrep 'alert_evns_mgr._subscribe' alert_notifications_debug.log | tail -n 4
[2025-04-14 23:28:30,532] DEBUG [alert_evns_mgr._subscribe:84] Received: [report_timestamp: 7493314800064490512
[2025-04-14 23:28:38,595] DEBUG [alert_evns_mgr._subscribe:84] Received: [report_timestamp: 7493314834424271156
[2025-04-14 23:28:48,781] DEBUG [alert_evns_mgr._subscribe:84] Received: [report_timestamp: 7493314885964127964
[2025-04-14 23:28:52,611] DEBUG [alert_evns_mgr._subscribe:84] Received: [report_timestamp: 7493314898849120102
You can also correlate with timestamps with the last alerts created for any alert configurations in the system.
File: /var/lib/avi/log/alert_notifications_debug*log*
/var/lib/avi/log]
└─$ zgrep -ih 'save' alert_notifications_debug*log* | grep -v "nosavealert" |sort | grep 'alert_manager' | tail -n 6
[2025-04-14 23:20:04,833] INFO [alert_manager.saveAlertToDb:963] Saved Alert to DB: System-Controller-Alert-00505698a202-1744672802.911488-1744672802-27791154 is created 1 obj alert-31a98435-da66-4739-a1f7-84dc11f2340d
[2025-04-14 23:20:04,836] INFO [alert_manager.raiseAlertTask:1069] saved alert System-Controller-Alert-00505698a202-1744672802.911488-1744672802-27791154 with uuid alert-31a98435-da66-4739-a1f7-84dc11f2340d
[2025-04-14 23:20:04,846] INFO [alert_manager.saveAlertToDb:963] Saved Alert to DB: Custom-Controller Alert-00505698a202-1744672802.911488-1744672802-13630715 is created 1 obj alert-005a2a4f-fbea-4968-9120-9b631cb3c0a0
[2025-04-14 23:20:04,849] INFO [alert_manager.raiseAlertTask:1069] saved alert Custom-Controller Alert-00505698a202-1744672802.911488-1744672802-13630715 with uuid alert-005a2a4f-fbea-4968-9120-9b631cb3c0a0
[2025-04-14 23:20:04,876] INFO [alert_manager.saveAlertToDb:963] Saved Alert to DB: System-Controller-Alert-00505698a202-1744672802.912353-1744672802-40134650 is created 1 obj alert-9ffc2268-1672-46fe-94d3-10660c89a6dd
[2025-04-14 23:20:04,877] INFO [alert_manager.raiseAlertTask:1069] saved alert System-Controller-Alert-00505698a202-1744672802.912353-1744672802-40134650 with uuid alert-9ffc2268-1672-46fe-94d3-10660c89a6dd
In the event manager logs (follower controller node) you will find a large amount of events with error "delay more than 128 seconds." The timestamps with the log messages from the alert notifications can be correlated and will be within the same timeframe.
File: /var/lib/avi/log/event_manager*INFO*
/var/lib/avi/log]
└─$ zgrep 'event_manager_streamer' event_manager.INFO | grep 'Event with' | tail -n 4
2025-04-14T23:27:12.069Z E 5095 eventmanager/event_manager_streamer.go:219 Event with ReportTimestamp 7493314258898989891, event_id CONTROLLER_SERVICE_FAILURE and obj type CLUSTER delayed by more than 128 seconds.
2025-04-14T23:27:12.069Z E 5095 eventmanager/event_manager_streamer.go:219 Event with ReportTimestamp 7493314215948661792, event_id CONTROLLER_SERVICE_FAILURE and obj type CLUSTER delayed by more than 128 seconds.
2025-04-14T23:27:12.170Z E 5130 eventmanager/event_manager_streamer.go:219 Event with ReportTimestamp 7493314344797716345, event_id CONTROLLER_SERVICE_FAILURE and obj type CLUSTER delayed by more than 128 seconds.
2025-04-14T23:27:12.170Z E 5130 eventmanager/event_manager_streamer.go:219 Event with ReportTimestamp 7493314387747578901, event_id CONTROLLER_SERVICE_FAILURE and obj type CLUSTER delayed by more than 128 seconds.
Affects Version(s):
30.1.x, 30.2.1–30.2.3, 31.1.1
This has been identified event manager can get into a deadlock and stop streaming events to alert manager which stops raising alerts and not recover.
Please upgrade the system to the fix version.
Bug ID: AV-242168
Fix Version: 30.2.4, 31.1.2, 31.2.1
Workaround(s):
Change the knob "alert_manager_use_evms" from controller_properties to use Log Manager instead of Event Manager, then restart the avipythoncontroller service (systemctl restart avipythoncontroller) on Controller Leader Node.
ssh to the controller leader controller node with the admin user
[admin:]: > show controller properties | grep alert
| alert_manager_use_evms | True |
[admin:]: > configure controller properties
[admin:]: controllerproperties> no alert_manager_use_evms
[admin:]: controllerproperties> save
[admin:]: > show controller properties | grep alert
| alert_manager_use_evms | False |
sudo systemctl restart avipythoncontroller.service
Note: