NSX T Native Load Balancer: Virtual Server Status Detected as 'Down'
search cancel

NSX T Native Load Balancer: Virtual Server Status Detected as 'Down'

book

Article ID: 381375

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

Virtual Server is 'Down' and we can check the alarm associated with the specific virtual server as shown below for more details on the issue.

Alarm:
Consult the load balancer pool to determine its status and verify its configuration. It incorrectly configured, reconfigure it and remove the load balancer pool from the virtual server then re-add it to the virtual server again. 

Cause

The cause of the NSX T Virtual Servers being down can be identified as follows:

  • Select the Virtual Servers.
  • Click on 'Down' to view more details.
    • Errors: LbVirtualServerStatus is DOWN 

NSX-T L7 Virtual Servers are down because the all associated Server Pool Members has been detected as Down. The reason for this can be identified as follows:

  • Select the Virtual Server Pool Member.
  • Click on 'Down' to view more details.
    • LbPoolStatus is DOWN.
      Members: 'PoolMemeber_IP_Address:PortNumber' with status: 'Failed to connect, the reason is Connection refused.'

Resolution

The error above indicates that the communication from the load balancer on the Active edge to the server was rejected for one of the following reasons:

  • The NSX Distributed Firewall (DFW) or an external firewall blocked the connection from reaching the server or the specified port.
    • For more information on how to check packet is blocked by DFW refer the document Troubleshooting Distributed Firewall on ESX Hosts
      • A brief of the same is here, SSH to the ESXi host where Server VM is running.
      • Run the following command: net-stats -l | grep -i Server_VM_Name
        • If the VM is mapped with more than one NIC, identify the correct interface using the MAC address.
        • Once you've found the correct interface, make a note of the port ID from the output.
      • Run the following command: summarize-dvfilter | less Then, search for the port ID you identified in the previous step. Make a note of the 'name' associated with that port ID.
      • Run the following command: vsipioctl getrules -f <name> This will display all the firewall rules applied to this VM and this specific interface
  • The server(s) in the load balancer pool refused the connection on the specified port.
    • To verify this further, you can perform a packet capture (PCAP) on the ESXi host for the Server VM's switchport as shown below. In the capture, you should observe that the server responds with a TCP reset (RST) to the incoming TCP SYN request. 
    • Please continue further troubleshooting within the server's guest OS or application to identify the cause of the connection rejection. Below are some parameters that the server administrator may want to verify, in addition to the troubleshooting steps they will need to carry out on their own.
      • Is the service/application running on the specified port? For example, if you have configured the HTTP load balancer to use port 443, verify that the service on the server is actually running on that port and not on a different one.
      • Is there a guest OS firewall or other security application blocking the connection?
      • Check the guest OS logs or application logs, if necessary, to determine the reason for the connection being rejected.
      • etc..

Additional Information

Please note that if the issue is caused by the DFW blocking communication between the load balancer and the server, you may observe two different outputs as shown below,

  1. Failure-Reason: "Connect to Peer Failure"
    1. This error is seen when the DFW action is configured as 'Reject'
    2. edge_name> get load-balancer <LB_UUID> pool <LB_POOL_UUID> status
      Thu Nov 07 2024 UTC 04:33:55.747
      Pool
      UUID                        : <LB_POOL_UUID>
      Display-Name                : <LB_POOL_name>
      Status                      : down
      Total-Members               : 1
      Primary Up                  : 0
      Primary Down                : 1
      Primary Disabled            : 0
      Primary Graceful Disabled   : 0
      Primary Unknown             : 0
      Backup Up                   : 0
      Backup Down                 : 0
      Backup Graceful Disabled    : 0
      Backup Disabled             : 0
      Backup Unknown              : 0
      
      Member
      Display-Name                : ubuntu-02
      Type                        : primary
      IP                          : ###.###.###.###
      Port                        : 80
      Status                      : down
      Last-State-Change-Time      : 2024-11-07 04:32:47
      
      Monitor
      Display-Name                : default-http-lb-monitor
      Type                        : HTTP
      Status                      : down
      Url                         : /
      Last-Check-Time             : 2024-11-07 04:33:52
      Last-State-Change-Time      : 2024-11-07 04:32:47
      Failure-Reason              : Connect to Peer Failure
  2. Failure-Reason: "TCP Handshake Timeout"
    1. This error is seen when the DFW action is configured as 'Drop'
    2. edge_name> get load-balancer <LB_UUID> pool <LB_POOL_UUID> status
      Thu Nov 07 2024 UTC 04:34:54.201
      Pool
      UUID                        : <LB_POOL_UUID>
      Display-Name                : <LB_POOL_name>
      Status                      : down
      Total-Members               : 1
      Primary Up                  : 0
      Primary Down                : 1
      Primary Disabled            : 0
      Primary Graceful Disabled   : 0
      Primary Unknown             : 0
      Backup Up                   : 0
      Backup Down                 : 0
      Backup Graceful Disabled    : 0
      Backup Disabled             : 0
      Backup Unknown              : 0
      
      Member
      Display-Name                : ubuntu-02
      Type                        : primary
      IP                          : ###.###.###.###
      Port                        : 80
      Status                      : down
      Last-State-Change-Time      : 2024-11-07 04:32:47
      
      Monitor
      Display-Name                : default-http-lb-monitor
      Type                        : HTTP
      Status                      : down
      Url                         : /
      Last-Check-Time             : 2024-11-07 04:34:52
      Last-State-Change-Time      : 2024-11-07 04:32:47
      Failure-Reason              : TCP Handshake Timeout

Note: The behavior above may occur when an external firewall is blocking communication, with similar 'Drop' or 'Reject' rules configured.

Refer the following KB for more information on Troubleshooting NSX Native Load Balancer