Troubleshooting Latency and Routing Loops in HCX Network Extension Deployments to Azure
search cancel

Troubleshooting Latency and Routing Loops in HCX Network Extension Deployments to Azure

book

Article ID: 388996

calendar_today

Updated On: 03-05-2025

Products

VMware HCX

Issue/Introduction

  • When using a network extension from on-premises to Azure VMware Solution (AVS) using VMware HCX, high latency and connectivity issues may occur to the destination cluster in AVS.

  • Network monitoring tools detect IP TTL zero errors, indicating potential routing loops between on-premises and AVS environments.

  • Applications running on the extended network experience timeouts, packet loss, or severe performance degradation.

  • When removing the network extension, connectivity and performance return to normal, confirming the issue is directly related to the network extension configuration.

Steps to validate:

  • Network performance monitoring shows significant latency increase after network extension deployment

  • Network devices (firewalls, routers) report IP TTL zero errors in their logs

  • Packet captures reveal packets with decreasing TTL values that never reach their destination

  • Tracing packets through the network shows circular routing patterns

  • Applications experience timeouts or extremely slow response times

  • Removing the network extension immediately resolves the issue

Environment

  • VMware HCX deployed on-premises
  • Azure VMware Solution (AVS)
  • Network security appliances (e.g., Palo Alto Networks, Cisco, FortiGate) positioned between the on-premises environment and AVS

Cause

HCX Network Extension deployments to Azure VMware Solution may experience latency and routing loop issues due to several factors:

  1. Asymmetric routing paths where outbound and return traffic take different network paths sometimes due to weight issues on AVS routes
  2. Double encapsulation of network packets due to overlays created by HCX and the underlying network infrastructure
  3. MTU mismatches causing packet fragmentation when encapsulated packets exceed the path MTU
  4. Stateful firewall inspection breaking the encapsulated traffic flows
  5. Routing table conflicts between the on-premises and cloud environments
  6. Missing route advertisements for extended network segments
  7. Incorrect handling of encapsulated traffic by intermediate network devices

These issues are particularly prevalent in new deployments because:

  • The routing infrastructure may not be fully optimized for HCX traffic patterns
  • Security appliances may not be properly configured to handle HCX encapsulation
  • AVS is not configured out of the box for HCX

Resolution

Before Reaching Out to Broadcom Support

  1. Check your routing tables, BGP routes, MTU configuration, and scan for duplicate IP addresses in your network environment.
  2. For detected routing loops:
    • Isolate the specific devices causing the loop
    • Update routing tables to eliminate redundant or conflicting routes
    • Check for duplicate IP address ranges or overlapping subnets
    • Verify BGP route advertisements do not create circular dependencies
    • Verify firewall rules properly allow HCX encapsulated traffic (ports 4500/UDP, 500/UDP, IP Protocol 50 ESP)
    • Enable Mobility Optimized Networking (MON) with correct Policy Routing as a potential workaround in HCX if asymmetric routing is confirmed
    • Run Service Mesh Diagnostics and check the traceroute path to see routing loops
  3. For MTU-related issues:
    • Implement consistent end-to-end MTU sizing across all network segments
    • Consider permanently reducing MTU on the extended network to accommodate encapsulation overhead
    • Adjust Maximum Segment Size (MSS) clamping on edge devices
  4. Reach out to the correct Microsoft support team for Azure VMware Solution (AVS) to coordinate troubleshooting efforts, and during the troubleshooting session:
    • Check for asymmetric routes and implement corrective measures to ensure symmetric routing
    • Have Microsoft confirm the ExpressRoute and their route configuration
    • Examine and update routing tables to ensure proper route propagation
    • Have Microsoft check the weights on their routes to make sure their route configuration is as expected, as asymmetric routing paths due to weight issues on AVS routes is a common cause of these problems
  5. Validate the solution by:
    • Verifying no further TTL zero errors appear in logs
    • Confirming normal latency to resources in the extended network
    • Testing application performance on migrated VMs

If Issues Persist

If the error persists after following these steps, contact Broadcom Support for further assistance.

Please provide the below information when opening a support request with Broadcom for this issue:

  • Network extension details (Network name, CIDR, VLAN, extension status from HCX UI)
  • HCX appliance versions and deployment topology
  • Screenshots of error messages in HCX Manager, network extensions UI, or network monitoring tools
  • Complete source and target HCX log bundles with HCX database dumps and IX appliance logs selected
  • Network packet captures from critical points in the network path
  • Network topology diagram showing all devices in the path between on-premises and AVS

Additional Information