HCX MON causes traffic drops between stretched VMs and gateway
search cancel

HCX MON causes traffic drops between stretched VMs and gateway

book

Article ID: 401073

calendar_today

Updated On:

Products

VMware HCX

Issue/Introduction

When enabling Mobility Optimized Networking (MON) in HCX environments, traffic is completely dropped between stretched virtual machines and their gateway after MON moves the gateway to the cloud. The L2 Extension functions correctly when MON is disabled, but stretched VMs lose all connectivity to their gateway immediately upon enabling MON. This results in complete loss of connectivity for VMs on the stretched network that need to communicate through the gateway.

The problem occurs when customers use MON to migrate gateways from on-premises to cloud environments. While Network Extension operates normally without MON enabled, activating MON to move the gateway results in stretched VMs being unable to reach their gateway due to network path interruptions.

Environment

  • VMware HCX (all versions)
  • Cloud environments (Azure VMware Solution, VMware Cloud on AWS, etc.)
  • On-premises vSphere environment
  • L2 Extension configured
  • MON (Mobility Optimized Networking) feature

Cause

This issue is caused by an interruption in the traffic path between stretched VMs and their gateway after MON moves the gateway to the cloud. The interruption can be identified by performing a traceroute from affected stretched VMs to their gateway IP. Common causes include firewall policies blocking traffic to the relocated gateway, NAT (Network Address Translation) configurations that modify packet headers, or other network security devices that interfere with the new traffic path. These interruptions prevent stretched VMs from reaching their gateway through the MON-optimized path, resulting in complete traffic drops.

Resolution

Follow these steps to identify and resolve the MON communication issue:

  1. Verify MON configuration is correct in both cloud and on-premises HCX environments.
  2. Confirm policy routes are properly configured according to the guidelines in KB article Traffic Not Following HCX Mobility Optimized Networking (MON) Policy Routes As Expected. Ensure deny rules are correctly implemented.
    • On a MON enabled network segment, the Distributed Logical Router (DLR) is configured as an IP gateway with the same IP address as the on-premises gateway
    • When cloud VMs send ARP requests to resolve the gateway MAC, it gets resolved by DLR's MAC address
    • HCX Manager configures policy routes to redirect packets back to the Cloud Site NE appliance
    • To verify segment policy route configuration:
      • Go to NSX Manager Policy-UI >> Segments >> Locate the MON-enabled L2E Segment >> Click on three vertical dots >> Copy Path to Clipboard
      • The copied path will look like: /infra/tier-1s/cgw/segments/hcx-ne-f4406fc7-8d44-4b63-8165-c4141f82c19e
      • Execute the following API command from HCX manager admin shell to retrieve routing policies:
       
      curl -k -u 'admin:<password>' --request GET --url "https://<nsxManager-URL>/policy/api/v1/infra/tier-1s/cgw/segments/hcx-ne-f4406fc7-8d44-4b63-8165-c4141f82c19e"
  3. Access a virtual machine on the stretched network that is experiencing gateway connectivity issues with MON enabled.
  4. Verify the gateway MAC address to ensure MON has successfully moved the gateway:
    • Check the ARP table on the affected stretched VM to identify the MAC address of the default gateway
    • Confirm the MAC address indicates the gateway has been moved to the cloud side by MON. The expected MAC when MON is enabled is the NSX VDR specific MAC: 02:50:56:56:44:52
  5. Perform traceroute tests from the affected stretched VM:
    • Run traceroute to the default gateway IP address
    • Document the hop sequence and identify where packets are being dropped on the path to the gateway
    • Note: Each numbered line in the traceroute output (1, 2, 3, etc.) represents a "hop" - a network device that forwards the packet
  6. Compare traceroute results between:
    • A stretched VM experiencing the gateway connectivity issue
    • A stretched VM that maintains gateway connectivity (if available)
    • Note differences in routing paths and the last successful hop
  7. Identify the last successful hop in the traceroute where traffic is dropped. This hop number indicates the last device that successfully processed the packet.
  8. Access the network device at the identified hop location where packets are being dropped. This could be a firewall, router, security appliance, or other network device.
  9. Check the device configuration at the identified hop:
    • Look for NAT rules that may be translating addresses
    • Verify firewall policies allow traffic from stretched VMs to the MON-relocated gateway
    • Check for any security features that might interfere with the new traffic path
    • Review routing configurations for the stretched network
  10. Resolve the identified issue:
    • If NAT is enabled when it shouldn't be, disable it for the stretched network segments
    • If firewall policies are blocking traffic, create appropriate allow rules for stretched VM to gateway communication
    • Adjust any security settings that interfere with the MON-optimized traffic path
  11. Test connectivity after resolving the issue:
    • Verify stretched VMs can reach their gateway
    • Confirm traffic flows properly between stretched VMs and resources accessed through the gateway
    • Validate gateway migration with MON functions as expected

If the error persists after following these steps, contact Broadcom Support for further assistance.

When opening a support request with Broadcom for this issue, please provide:

  • HCX version details from both cloud and on-premises
  • MON configuration screenshots
  • Policy route configuration and API output from the segment query
  • Stretched network details and gateway IP
  • Complete traceroute output from stretched VMs to gateway showing all hops and where traffic drops
  • Network topology diagram highlighting firewall placement
  • HCX logs from the cloud and on-premises