Service Engines Stuck in Partition State After Controller Leader Change
search cancel

Service Engines Stuck in Partition State After Controller Leader Change

book

Article ID: 412071

calendar_today

Updated On:

Products

VMware Avi Load Balancer

Issue/Introduction

 On Avi Service Engines running with 31.x version based on Ubuntu 24.04:

  • se_agent.pid and se_log_agent.pid files created in /tmp are removed.

  • When these PID files are missing, se-supervisor cannot signal the se_agent or se_log_agent during a cluster leader change.

Impact:

  • The SE does not get updated with the new Controller leader information.

  • It continues to try connecting to the old Controller leader.

  • This results in connectivity issues between the Service Engine and the new Controller leader.

Log Path on SE: /var/lib/avi/log/se_supervisor.log

Logs observed during the issue window:

[2025-09-19 04:42:10,482] INFO [se_supervisor.handle_leader_change:2704] Notifying SE Agent of cluster leader
[2025-09-19 04:43:10,494] ERROR [se_supervisor.wait_for_se_log_agent_up:180] ^[[31mERROR: Unable to send signal to se_log_agent^[[0m
[2025-09-19 04:43:10,494] INFO [se_supervisor.signal_process:206] Cannot send signal to se_log_agent because process is not up
[2025-09-19 04:44:10,504] ERROR [se_supervisor.wait_for_se_agent_up:163] ^[[31mERROR: Unable to send signal to se agent^[[0m
[2025-09-19 04:44:10,504] INFO [se_supervisor.signal_process:203] Cannot send signal to se_agent because process is not up 

Environment

  • VMware AVI Load Balancer

    • 31.1.X

Cause

  • In Ubuntu 24.04, the default systemd-tmpfiles configuration enforces automatic cleanup of /tmp every 30 days.

  • In Ubuntu 20.04, /tmp cleanup only occurred at reboot.

  • Because Avi Service Engines store se_agent.pid and se_log_agent.pid under /tmp:

    • These PID files are deleted after 30 days of inactivity.

    • se-supervisor assumes the processes are not running and fails to notify them during leader changes.

Resolution

  • Upgrade Avi to one of the below fixed versions:
      - 31.1.2-2p1 Release Notes
           - 31.2.1 Release Notes
  • Workaround(s)
    - Reboot the affected Service Engines from Avi ALB UI > Infra > Cloud Resource > Service Engine