When the VMware HCX Mobility Optimized Networking (MON) feature is enabled, a communication failure may be experienced from the cloud site Virtual Machines (VMs) towards the on premise default gateway router over the network extension. This occurs when a First Hop Redundancy Protocol (FHRP) is used by some 3rd party on premise default gateway routers.
First Hop Redundancy Protocols (FHRP) allow multiple redundant 3rd party default gateway routers that exist at the same site to share a virtual IP. Some FHRP protocols include:
This is fixed in HCX 4.8.0 release.
Workaround:
To remediate the packet loss and recover extended datapath, user is recommended to disable the use of FHRP for the default gateway router at Onprem when the HCX MON feature is in use.
Also, The problematic ARP probe needs to be disabled by running below commands on the cloud/target NE appliance:
touch /opt/vmware/cgw/config/DisableVrrpSupport systemctl restart cgw
Note: If user has enabled HCX NE HA (High Availability) feature then restarting the HCX "cgw" service will trigger an HA event.
Please follow below steps/recommendations to disable ARP probes when HA is enabled:
IMPORTANT: Once HSRP/VRRP has been disabled and the router configuration has been changed to allow it to respond to the D.A.D. probes, then no further steps are necessary. The NE appliance will be able to learn the Onprem gateway MAC address from the response through D.A.D. probes.
However, If the router doesn’t respond to D.A.D probes even after disabling FHRP(HSRP/VRRP) protocol, then additional steps are needed to ensure that the Onprem router MAC address is learned always.
Note: When windows server attached to the extended segment, use ("arp -d") under CMD to start re-learning MAC address for the Onprem gateway IP.
Restrictions & other Considerations
Configuration Validation & Datapath Monitoring
ebtables -t nat -L | grep "dnat" | grep -v "ARP"
Example:
-d 2:50:56:##:##:58 -i vNic_2 -j dnat --to-dst 0:50:56:##:##:da --dnat-target ACCEPT
0:50:56:##:##:da is Onprem router's MAC address.
2:50:56:##:##:58 is HCX Stitching MAC address.
Additionally, Implement a central monitoring VM that has a continuous ping going to various cloud VMs, which can generate alerts if any intermittent packet loss occurred over extended datapath.