HCX site pairing fails or goes down with Azure VMware Solution
search cancel

HCX site pairing fails or goes down with Azure VMware Solution

book

Article ID: 404937

calendar_today

Updated On:

Products

Issue/Introduction

HCX site pairing between on-premises environment and Azure VMware Solution (AVS) fails with connectivity errors or shows as disconnected. The site pairing connection cannot be established or maintained, preventing HCX services from functioning properly. Attempts to reconnect fail with site pairing errors in the HCX interface.

The following symptoms may be observed:

  • Site pairing shows as disconnected or down in HCX interface
  • Error messages during site pairing configuration attempts such as "Site Pairing Error: Please check the Remote HCX URL you are trying to connect and try again"
  • SocketTimeoutException or ConnectTimeoutException errors in HCX logs
  • HCX migration capabilities and configuration workflows are impacted
  • Network connectivity issues between on-premises and cloud environments
  • ExpressRoute connectivity appears functional but HCX communication fails

This issue commonly occurs in environments using Azure ExpressRoute where network routing between on-premises and AVS environments is not properly configured.

Environment

  • VMware HCX (all versions)
  • Microsoft Azure VMware Solution (AVS)
  • Azure ExpressRoute connectivity
  • On-premises VMware infrastructure

Cause

Network routing configuration issues between the ExpressRoute circuit and Azure VMware Solution environment prevent proper communication between on-premises and cloud HCX managers. This occurs when BGP routing is not correctly distributing routes to the AVS environment, causing connectivity failure in both directions despite ExpressRoute appearing to be functional.

This issue typically manifests when the ExpressRoute circuit cannot properly advertise routes to reach the on-premises environment from AVS, or when route propagation settings are misconfigured in the Azure virtual hub.

Resolution

Follow these steps to diagnose and resolve the connectivity issue:

Step 1: Perform Initial HCX Troubleshooting

Follow the troubleshooting steps in HCX Site Pairing Connectivity Diagnostics to verify:

  • Time synchronization: Date and time synchronization between HCX managers
  • DNS resolution: Verify DNS resolution functionality
  • Basic connectivity: Test using curl or telnet to HCX manager ports
  • Certificate validation: Confirm certificate configuration is correct

Step 2: Test Network Connectivity

Perform comprehensive connectivity testing between environments to isolate where the connection failure occurs:

  • From on-premises: Test connectivity to cloud HCX manager IP address
  • Gateway verification: Verify if basic gateway connectivity is functional
  • Port accessibility: Check if specific HCX communication ports (443) are accessible
  • Bidirectional traceroute analysis: Use traceroute from both HCX managers to identify where connectivity fails

Traceroute Analysis for Issue Isolation

The network connectivity issue can occur in three different areas:

  1. Between on-premises and AVS infrastructure (ExpressRoute/BGP routing issue)
  2. Within the on-premises client infrastructure (local network connectivity)
  3. Within the AVS infrastructure (cloud network connectivity)

Perform traceroute testing from both sides:

  • From on-premises HCX manager: Run traceroute to cloud HCX manager IP address
  • From cloud HCX manager: Run traceroute to on-premises HCX manager IP address

Analyze traceroute results to determine issue location:

  • Connection drops at ExpressRoute boundary: Indicates BGP routing issue between on-premises and AVS - proceed to Step 3 for BGP analysis
  • Connection drops within on-premises network: Focus troubleshooting on local on-premises routing, firewall configuration, and infrastructure issues
  • Connection drops within AVS network: Work with Azure support to investigate AVS internal network configuration
  • Bidirectional failure: Most commonly indicates ExpressRoute BGP routing issue requiring Step 3 analysis

Step 3: Verify Azure ExpressRoute Configuration

If HCX troubleshooting reveals network connectivity issues, verify Azure configuration:

  • ExpressRoute circuit: Navigate to Azure portal and locate the ExpressRoute circuit
  • Virtual hub settings: Check the virtual hub configuration
  • Default routes: Verify "propagate default routes" setting is enabled (Microsoft recommends this setting be enabled for proper connectivity)
  • BGP configuration: Confirm BGP route advertisement settings

BGP Table Analysis

Check BGP tables on both sides of the ExpressRoute circuit to verify the management subnet is being redistributed as expected:

  • On-premises: Gather BGP tables from your on-premises infrastructure showing learned routes
  • Azure side: Work with Azure support to obtain BGP tables from the Azure side of the connection
  • Route comparison: Compare both sets of BGP tables to verify the AVS management subnet routes are being properly advertised and received
  • Bidirectional verification: Confirm bidirectional route advertisement is functioning

Note: If routes are missing from BGP tables, this confirms the routing redistribution issue

Step 4: Escalate to Azure Support

When network connectivity testing confirms routing issues, escalate to Azure infrastructure team:

  • Contact Microsoft Azure support for ExpressRoute routing configuration assistance
  • Focus area: Route distribution between ExpressRoute circuit and AVS environment
  • Request verification: Ensure routes are being properly advertised to AVS
  • Provide evidence: Share BGP table analysis results from Step 3
  • Confirm setup: Ensure bidirectional routing is configured correctly

Step 5: Monitor for Resolution

Once Azure networking team resolves the routing configuration:

  • Automatic recovery: HCX site pairing should automatically re-establish
  • No HCX changes needed: No additional HCX configuration changes are typically required
  • Service verification: Verify HCX services resume normal operation

Important: This issue requires resolution at the Azure network infrastructure level and is outside the scope of Broadcom HCX support. HCX functionality will resume automatically once proper network routing is restored.

If the error persists after following these steps, contact Broadcom Support for further assistance including the information gathered through this article.