Native NSX Load-balancer configured to handle UDP packets stops functioning during NSX Edge upgrade or failover.
search cancel

Native NSX Load-balancer configured to handle UDP packets stops functioning during NSX Edge upgrade or failover.

book

Article ID: 421960

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

During the NSX edge failover the UDP Load-Balancer can stop forwarding the packets to the backend pool members. 

Environment

VMware NSX 

Cause

  • During the Load-balancer failover from Active Edge to Standby Edge, the Tier-1 gateway failover  can happen before the Load-balancer configuration takes effect
  • If this happens, any new connection request coming from the clients cannot match the Load-Balancer rule since it is realized yet, however it creates the firewall connection which drops the traffic.  
  • The Firewall connection idle timeout would be refreshed as there is a continuous request from the same client keeps coming in and the same firewall rule will continue to drop the traffic. 
  • Later the Load-Balancer configuration takes effect, but the existing firewall connection still drops the traffic. 
  • From the syslog logs, you can confirm the same.
  • Tier-1 gateway became active:-

    <Timestamp> edge1 NSX 1 - [nsx@6876 comp="nsx-edge" subcomp="nsx-edge-agent" s2comp="nsx-monitoring" entId="xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" tid="1" level="ERROR" eventState="Off" eventFeatureName="high_availability" eventSev="error" eventType="tier1_gateway_failover"] Context report: {"previous_gateway_state":"Standby","current_gateway_state":"Active","entity_id":"xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx","service_router_id":"xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx","failover_reason":"LB service xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx status READY"}

  • Load balancer configuration took effect after few seconds:- 

    <Timestamp> edge1 NSX 12496 FIREWALL [nsx@6876 comp="nsx-edge" subcomp="datapathd" s2comp="firewall" tname="data1" level="INFO"] update LB xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx  (created) gen_id 1 trunk_id 4 cp_count 1 start_l4_worker 3 ha_enabled 0 attached to lr xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx  uplink or csp xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx 

Resolution

This is a known issue impacting VMware NSX. This is a race condition which could impact the UDP LB setup and stop UDP LB to forward the packets to backend pool members.

Workarounds: 

Method 1: Create a gateway firewall stateless rule with Source :any Destination: Load-balancer-VIP to "Accept"

Method 2: If there is no need for gateway firewall, you can disable the gateway firewall rule.