When performing a manual HA failover for HCX Network Extension appliances with Mobility Optimized Networking (MON) enabled, a network connectivity disruption can occur between cloud VMs and on-premises networks. This issue specifically impacts L3 traffic from cloud VMs to on-premises networks while L2 connectivity remains unaffected.
The following symptoms indicate this issue:
VMware HCX
VMware NSX
The root cause involves the interaction between HCX MON configuration and NSX-T policy routes. When MON is enabled, HCX configures policy routes on the NSX Tier-1 cloud gateway to handle traffic routing between cloud and on-premises resources. These routes are tied to logical ports associated with the active Network Extension appliance.
During a manual HA failover
The issue can be identified through packet capture analysis showing
Upgrade to NSX-T 4.0.0.0 or later, which includes a fix for automatic policy route updates during HA failover events.
Use tcpdump or pktcap-uw on the NSX edge to verify traffic flow
tcpdump -e -i vNic_2 -n -S arp
Check NSX-T manager logs for policy route configuration
Look for policy route updates
grep "PolicyConnectivity" /var/log/vmware/nsx-manager/manager.log
Check logical router forwarding table
get logical-router <router-id> forwarding
Monitor ARP resolution on affected networks
Use pktcap-uw for detailed packet analysis
pktcap-uw --switchport <port-id> --dir 1 --stage 0
/var/log/vmware/nsx-manager/manager.log
/var/log/vmware/hcx/
When troubleshooting, look for these specific patterns in the logs:
StaticRoutingServiceImpl: Persisting config for new static route
DLRStaticRouteCCPFacadeImpl: Marking delete NextHop
Missing ARP responses from cloud
VM
Request who-has 192.168.x.x tell 192.168.x.x
Look for connectivity state changes
type="DISCONNECTED","connectivity":"OFF"