AVI Load balancer service engines going in partitioned state

AVI Load balancer service engines going in partitioned state

search cancel

AVI Load balancer service engines going in partitioned state

book

Article ID: 405030

calendar_today

Updated On:

Products

VMware Avi Load Balancer

Issue/Introduction

AVI load balancer service engines are in a partitioned state.

Symptoms:

Error on the SE from the controller UI:
- State: Partitioned
- Reason: Lost connectivity to Service Engine
SE status from the controller shell:
- Run the command "show serviceengine" and check the output is as below:
- Oper State:
- OPER_PARTITIONED
In this scenario the service engine is reachable/able to attach to the controller but remain in the OPER_PARTITIONED
- - Log in to the controller leader shell.
  - Run the command below and ensure SE successfully attaches to the controller.
    - attach serviceengine <se_name>

Additionally, you may notice the below errors on the SE, but these error logs will only show up when the controller leader node becomes inactive during the time of the issue.
- - The /var/lib/avi/log/se_supervisor.log may contain the following errors.
    
    [2025-06-23 03:13:45,693] ERROR [se_supervisor.main:2814] Error in run: Could not get redis IP from cluster services watcher
    [2025-06-23 05:44:59,210] ERROR [se_supervisor.main:2814] Error in run: Could not get redis IP from cluster services watcher
  - The /var/log/syslog may show systemd errors related to the se_supervisor.service.
    
    Jun 23 05:45:06 Avi-se-#### systemd[1]: se_supervisor.service: Start request repeated too quickly.
    Jun 23 05:45:06 Avi-se-#### systemd[1]: se_supervisor.service: Failed with result 'signal'.
    Jun 23 05:45:06 Avi-se-#### systemd[1]: Failed to start Avi Service Engine Startup script.

Environment

VMware AVI Load Balancer
- 30.1.2

Cause

A CPU soft lockup is the probable cause of the issue.
- On the controller leader node, check if there are any CPU soft lockup in syslog.
- Example:
  - root@##-##-##-##:/var/log# grep -i "soft lockup" syslog
    Jun 23 03:13:39 ##-##-##-## kernel: [9545082.074865] watchdog: BUG: soft lockup - CPU#4 stuck for 27s! [swapper/4:0]
    Jun 23 03:13:39 ##-##-##-## kernel: [9545082.074894] watchdog: BUG: soft lockup - CPU#1 stuck for 25s! [swapper/1:0]
    Jun 23 03:13:39 ##-##-##-## kernel: [9545082.074899] watchdog: BUG: soft lockup - CPU#2 stuck for 25s! [se_controller_i:897348]

Resolution

The issue has been fixed on the following versions:
- 31.2.1
- 30.2.5
Workaround:
- Restart the se_supervisor service on the partitioned SEs.
- Connect to the controller leader node shell.
- Run the below command:
  - attach serviceengine <se_name>
  - sudo systemctl restart se_supervisor.service

Note: If the issue persists, kindly create an SR with Broadcom support.

Feedback

thumb_up Yes

thumb_down No