Down Status of Load Balancer Pool Alarm
search cancel

Down Status of Load Balancer Pool Alarm

book

Article ID: 372279

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

Title: Alarm for Down Status of Load Balancer Pool
Event ID: load_balancer.pool_status_down
 
Alarm Description:
  • Purpose: To inform users that the load balancer pool is down.
  • Impact: All the traffic to this pool is broken.

Environment

VMware NSX-T Datacenter
VMware NSX

Cause

  • The network to all backend servers in this pool may be unreachable.
  • The service in all backend servers is not working well.

Resolution

Steps to resolve
For 3.0.0 and higher

Recommendation Action:

Determine which load balancer pools and members are DOWN.

In the UI navigate to Networking -> Load Balancing and select the Server Pools tab.
Look for any pools that show a status of DOWN. 
Click on that DOWN status, it will open a window with the status of all pool members.
Verify which members are status DOWN.


You can also review the pool member status at the edge command line for further detail.
Log in to the edge that contains the Load Balancer as admin.
issue the following commands to view the load balancer and pool status

> get load-balancer
Thu Dec 12 2024 UTC 19:30:37.690
Load Balancer
Applied To                         :
    Logical Router Id              : ########-####-####-####-############
    Service Router Id              : ########-####-####-####-############
Display Name                       : Test-lb
Enabled                            : True
UUID                               : ########-####-####-####-############
Log Level                          : LB_LOG_LEVEL_INFO
Relax Scale Validation             : False
Size                               : SMALL
Virtual Server Id                  : ########-####-####-####-############


> get load-balancer <LB UUID from previous command> pool status
Thu Dec 12 2024 UTC 19:53:20.041
Pool
UUID            : ########-####-####-####-############
Display-Name    : dummy-lb-pool
Members         : 1
Status          : down
Primary-UP-No   : 0
Backup-UP-No    : 0


> get load-balancer <LB UUID from previous command> pool <Pool UUID from previous command> status
Thu Dec 12 2024 UTC 19:54:08.338
Pool
UUID                        : ########-####-####-####-############
Display-Name                : dummy-lb-pool
Status                      : down
Total-Members               : 1
Primary Up                  : 0
Primary Down                : 1
Primary Disabled            : 0
Primary Graceful Disabled   : 0
Primary Unknown             : 0
Backup Up                   : 0
Backup Down                 : 0
Backup Graceful Disabled    : 0
Backup Disabled             : 0
Backup Unknown              : 0

Member
Display-Name                : dummy-backend
Type                        : primary
IP                          : ##.##.##.##
Port                        : 443
Status                      : down
Last-State-Change-Time      : 2024-12-12 18:59:36

Monitor
Display-Name                : default-http-lb-monitor
Type                        : HTTP
Status                      : unknown
Url                         : /
Last-Check-Time             : 2024-12-12 18:59:36
Last-State-Change-Time      : 2024-12-12 19:53:38
Failure-Reason              : Health Check Intializing

 

For each member that is showing DOWN or UNKNOWN

  • Check network connectivity from the load balancer to the impacted pool members.
  • Validate application health of each pool member.
  • When the health of the member is established, the pool member status is updated to healthy based on the 'Rise Count' configuration in the monitor. 
  • Remediate the issue by rebooting the pool member or make the Edge node enter maintenance mode, then exit maintenance mode.

Maintenance window required for remediation? Optional.