Pool members on a different subnet to the virtual server IP never come UP on a newly deployed NSX native load balancer
search cancel

Pool members on a different subnet to the virtual server IP never come UP on a newly deployed NSX native load balancer

book

Article ID: 420321

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

  • Virtual Server is in a DOWN state since the pool members are not in an UP state.
  • Load balancer always shows as degraded
  • The virtual server IP address is on a different subnet to the pool members.
  • This is a new build environment and the pool members have never been up before.
  • There is a HTTP, HTTPS or TCP active health monitor on the virtual server.
  • In the /var/log/syslog of the active LB Edge logs similar to the following are reported:

    NSX 3748888 LB [nsx@6876 comp="nsx-edge" subcomp="nsx-edge-lb.lb" level="ERROR"] "ngx stats pool uuid is not found in config: {"uuid": "<UUID>", "status": "down", "members": [{"id": "<POOL MEMBER IP ADDRESS:PORT>", "status": "down", "last_change_time": "1763939127218", "monitors": [{"uuid": "<MONITOR UUID>", "status": "down", "last_check_time": "1764091498751", "last_change_time": "1763939127218", "failure_code": "24361", "failure_reason": "TCP Handshake Timeout"}]}, {"id": "POOL MEMBER IP ADDRESS:PORT", "status": "down", "last_change_time": "1763939127218", "monitors": [{"uuid": "<MONITOR UUID>", "status": "down", "last_check_time": "1764091498751", "last_change_time": "1763936776174", "failure_code": "24361", "failure_reason": "TCP Handshake Timeout"}]}]}"

  • The virtual server is configured with the default SNAT translation mode of 'Automap Mode'.
  • Packet captures gathered on the Edge uplink vnic validate that TCP packets to the pool members are egressing with a source IP address in the 100.64.0.X/31 subnet. Please note that this is the expected behaviour with 'Automap Mode'. An example packet capture command would be the following:

    pktcap-uw --switchport <PORT ID> --capture VnicRx,VnicTx --rcf "geneve and host <IP ADDRESS OF POOL MEMBER> and port <TCP PORT configured on the health monitor>" -o - | tcpdump-uw -ner -

  • The same packet capture on the pool member vnic validate that the packets are not being delivered to it.

Environment

VMware NSX

Cause

  • Since the Virtual Server IP is on a different subnet to the destination pool members, the packets will be transmitted from the Edge with a NATted source IP, which will be the VS IP. Because of this the packets will be forwarded to a gateway\router to be routed. 
  • If the required routes are not configured on that router then they will be dropped.

Resolution

  • Confirm with the administrator of the router, that routes are configured to the destination pool member subnet and routes back to the subnet being used for the source NAT.
  • If a route has been configured which expects the packets to be source NATted from the virtual server IP, then reconfigure the server pool to use the SNAT Translation Mode 'IP Pool'.
  • As an example if your virtual server IP is configured as 10.10.10.10, then configure your pool the following way:

 

Additional Information

  • Troubleshooting NSX Native Load Balancer KB376344
  • NSX Native Load Balancer Administration Guide. Please note the following regarding SNAT translation mode 'Automap' using a service router uplink or service interface as the SNAT source: