Alert manager fails to trigger Alerts for any events after Leader Node Failover
search cancel

Alert manager fails to trigger Alerts for any events after Leader Node Failover

book

Article ID: 401093

calendar_today

Updated On:

Products

VMware Avi Load Balancer

Issue/Introduction

  • Following a leader node failover, the system generates critical events, specifically SE_UP (Service Engine Up) and CONTROLLER_NODE_JOINED. However, the alert_manager fails to generate and trigger the corresponding alerts for these events. 
  • Any transient events during failover fail to generate related alerts. 

Environment

Any version below 22.1.7-2p7

Cause

  • The cause of this issue is a timing mismatch between the alert generating component (alert_manager) and the log collection service (log_mgr) after a node failover. The log_mgr requires over five minutes to initialize, while the alert_manager only maintains a maximum 30-second search window for past events. By the time the log_mgr is fully operational, the alert_manager's search window has already advanced, causing it to miss the initial, critical failover events.
  • This is a known issue on 22.1.7-2p7 or below.

Resolution