After a planned Edge failover, load balancer server pools and virtual servers all show down on NSX version 4.1.2.4
VMware NSX 4.x
The main cause of the server pools and virtual servers to going down is having the load balancer (LB) and route-based VPN are working in the same Tier-1 LR
When an Edge failover is conducted, the Edge firewall gets fully synchronized with rules from the manager as expected.
At the same time the load balancer is updating the Edge's internal DB to write its LB specific firewall rules.
At this point the FW needs to apply the LB rules to the LB interface. However, when it pulls the existing ruleset it finds no rules in place.
This discrepancy results in the standard FW rules getting applied, but not the LB rules.
When this issue happens, there's an parse error line produced indicating there is a missing required key for virtual server pools.
nsx_edge> grep -i "Missing required key" /var/log/syslog
/var/log/syslog
- [nsx@6876 comp="nsx-edge" subcomp="agg-service" tid="3571" level="ERROR" errorCode="MPA13822"] [GetVServerStats] Failed to parse json: Missing required key uuidMissing required key virtual_servers
- [nsx@6876 comp="nsx-edge" subcomp="agg-service" tid="3571" level="ERROR" errorCode="MPA13822"] [GetPoolStats] Failed to parse json: Missing required key uuidMissing required key pools
- [nsx@6876 comp="nsx-edge" subcomp="agg-service" tid="3571" level="ERROR" errorCode="MPA13822"] [GetPoolStats] Failed to parse json: Missing required key uuidMissing required key pools
For the L4 virtual server, if this issue happens, the error logs below will be reported, indicating that the LB FW rule is not ready in the datapath: -[nsx@6876 comp="nsx-edge" subcomp="nsx-edge-lb.lb" level="ERROR"] "query datapathd stats encountered an error: 2 b'virtual server ########################################## is not valid\nedge-appctl: /var/run/vmware/edge/dpd.ctl: server returned an error\n'"
-[nsx@6876 comp="nsx-edge" subcomp="agg-service" tid="3569" level="INFO"] [GetVServerStats] stats result: {#012 "errors": [#012 "24304: Internal Error: Query LB Datapath Failed. virtual server ########################################## is not valid\nedge-appctl: /var/run/vmware/edge/dpd.ctl: server returned an error\n"]}\
-[nsx@6876 comp="nsx-edge" subcomp="agg-service" tid="3569" level="ERROR" errorCode="MPA13822"] [GetVServerStats] Failed to parse json: Missing required key uuidMissing required key virtual_servers
----------
-[nsx@6876 comp="nsx-edge" subcomp="nsx-edge-lb.lb" level="ERROR"] "query datapathd stats encountered an error: 2 b'pool ############################## is not valid\nedge-appctl: /var/run/vmware/edge/dpd.ctl: server returned an error\n'"
-[nsx@6876 comp="nsx-edge" subcomp="agg-service" tid="3569" level="INFO"] [GetPoolStats] stats result: {#012 "errors": [#012 "24304: Internal Error: Query LB Datapath Failed. pool ################################### is not valid\nedge-appctl: /var/run/vmware/edge/dpd.ctl: server returned an error\n"]}
-[nsx@6876 comp="nsx-edge" subcomp="agg-service" tid="3569" level="ERROR" errorCode="MPA13822"] [GetPoolStats] Failed to parse json: Missing required key uuidMissing required key pools
-[nsx@6876 comp="nsx-edge" subcomp="agg-service" tid="3569" level="ERROR" errorCode="MPA13820"] [PoolStatsHandler] Cannot get stats for pool with LBS: ##########################Pool: ##########################################
Though this issue can also happen with L7 server, there are no error logs produced when this issue occurs. Hence there's no log used to identify this issue.
There are two (2) workarounds available for this issue:
Scenario 1:
Place the impacted Edge into and out of maintenance mode.
Scenario 2:
Configure LB and route-based VPN in different Tier-1 LRs.
An alternative solution is to implement the NSX Advanced Load Balancer. Per the VMware NSX 4.2.0 Release Notes:
Entitlement Change for the NSX Load Balancer
In a future major release of NSX, VMware intends to change the entitlement of the built-in NSX load balancer (a.k.a. NSX-T Load Balancer). This load balancer will only support load balancing for Aria Automation, IaaS Control Plane (Supervisor Cluster), and load balancing of VCF infrastructure components.
VMware recommends that customers who need general purpose and advanced load balancing features purchase Avi Load Balancer. Avi provides a superset of the NSX load balancing functionality including GSLB, advanced analytics, container ingress, application security, and WAF.
Existing entitlement to the built-in NSX load balancer for customers using NSX 4.x will remain for the duration of the NSX 4.x release series.