Site Pairing Connectivity Issues When Deploying AVS Express Route
search cancel

Site Pairing Connectivity Issues When Deploying AVS Express Route

book

Article ID: 393464

calendar_today

Updated On:

Products

VMware HCX

Issue/Introduction

When moving to AVS Express Route whether it is for a new deployment or transitioning an existing deployment to Express Route from another setup, such as VPN, users sometimes experience issues with being unable to route the traffic for the HCX site pairing across the express route as expected. This can cause the site pairing to be completely down, sometimes it can cause flapping, and in some circumstances this could show the site pairing up on only one side of the connection.

  • Complete failure of HCX site pairing connectivity
  • Intermittent connectivity with site pairing flapping between up/down states
  • Asymmetric connectivity where site pairing appears active on one side but not the other
  • Inconsistent route selection when multiple HCX sites are configured
  • Timeout errors when attempting to establish or maintain site pairings

Environment

HCX

AVS ExpressRoute

Cause

AVS by Microsoft automatically routes traffic between the onPrem site and AVS using BGP with ECMP (Equal Cost Multi Path). onPrem users sometimes will use a single HCX Manager for multiple site pairs, likely due to restrictions when it comes to vCenter licensing. This is achieved by using separate paths and sometimes separate virtual routing tables onPrem. When this traffic gets to the ExpressRoute BGP connection, it is able to send over the route to AVS, but when the traffic gets to AVS, if there are multiple site pairings, the route in the ExpressRoute will not know which way to send the traffic to the HCX Managers, and this causes the site pairing to inconsistently find the HCX Manager.

Resolution

To resolve HCX site pairing connectivity issues over ExpressRoute when using multiple site pairs with a single HCX Manager, implement one of the following solutions

  1. Split the routes into more specific subnets:
    • Configure your on-premises HCX Manager subnet into more specific routes (/25, /26, etc. instead of /24)
    • Advertise these more specific routes through BGP to ensure proper traffic routing
    • This allows the AVS side to properly prioritize traffic as ECMP routing selects the most specific route first
  2. Deploy a second HCX Manager:
    • If you have the licensing available, deploy a dedicated HCX Manager for each site pairing
    • Use separate subnets for each HCX Manager to ensure clear routing paths
    • This completely eliminates the routing ambiguity by providing dedicated paths for each site pairing
  3. Dedicated ExpressRoute circuits:
    • For critical environments, consider implementing dedicated ExpressRoute circuits for each HCX site pairing
    • This provides traffic isolation and eliminates ECMP confusion between sites
  4. Verify BGP configuration:
    • Ensure BGP is correctly advertising the HCX Manager IP addresses with proper next-hop information
    • Check that route propagation is enabled in all necessary route tables

For optimal results, we recommend option #1 as it requires minimal configuration changes while effectively solving the routing ambiguity by leveraging ECMP's preference for more specific routes. If licensing allows, option #2 provides the cleanest separation and eliminates routing ambiguity entirely. If licensing allows, option #2 provides the cleanest separation and eliminates routing ambiguity entirely.

Verification

After implementing the solution, verify that the issue is resolved by:

  1. Checking the site pairing status on both sides of the connection
  2. Verifying that route advertisements are being received correctly from both on-premises and Azure Express Route
  3. Test HCX functionality by initiating test migrations or replications
  4. Use network capture tools at strategic points in the network path to verify traffic flow use this article for more information on testing the site pairing - HCX Site Pairing Connectivity Diagnostics

If issues persist after implementing these solutions, consider opening a support case with Microsoft.

If you still need help after reaching out to Microsoft please reference this article and provide the below information when opening a support request with Broadcom for this issue