HCX Site Pairing Shows "Connection from Remote Side May Be Down" Error Despite Functional Connection - HCX appliance shows "Disconnected" or "Undecided" status
search cancel

HCX Site Pairing Shows "Connection from Remote Side May Be Down" Error Despite Functional Connection - HCX appliance shows "Disconnected" or "Undecided" status

book

Article ID: 389415

calendar_today

Updated On:

Products

VMware HCX

Issue/Introduction

  • HCX site pairing shows an alert "Connection from remote side may be down" on either side of the connection (on-premises HCX Manager or cloud HCX Manager) while the connection remains visible and functional from the other side.
  • No service disruption is observed for migration or network extension services.
  • The site pairing appears disconnected only in the user interface while the underlying connection remains operational.
  • The issue typically occurs on HCX Managers with extended uptime without reboots or on outdated versions.

Steps to validate:

  • Verify the site pairing shows as connected from one side while showing the "Connection from remote side may be down" alert on the other side
  • Confirm that services dependent on site pairing (migrations, network extensions) continue to function normally despite the alert
  • Use the detailed connectivity diagnostics steps in KB article HCX Site Pairing Connectivity Diagnostic Steps to confirm the underlying site pairing connection is actually functional
  • Check the status of critical HCX services:
systemctl status app-engine
systemctl status web-engine
systemctl status appliance-management
systemctl status postgresdb
systemctl status zookeeper
systemctl status kafka
 
For comprehensive service status checking, use:
 
systemctl --type=service | grep "zoo\|kaf\|web\|app\|postgres"
 
Log into the HCX manager via the management interface on port 9443 and verify system status

Environment

VMware HCX 4.9.x or earlier versions

Cause

The issue is caused by UI component synchronization problems in older HCX versions, particularly when:

  • The HCX Manager has been running continuously for an extended period (high uptime)
  • Web-engine service enters an inconsistent state while background services remain functional
  • Disk space issues, particularly in the /common directory, may contribute to service instability
  • Log rotation or database maintenance issues in older versions may lead to resource constraints
  • This issue is often caused by high CPU or resource contention on the vCenter Server, which prevents the HCX Manager from maintaining a stable management state with the appliance.

This issue commonly occurs on older versions (pre-4.10) of HCX with extended uptime. The problem appears to be resolved in newer versions. The typical pattern is that the UI shows a disconnection while actual services remain operational.

Resolution

  1. Verify the underlying site pairing connectivity using the steps outlined in KB article HCX Site Pairing Connectivity Diagnostic Steps, especially:
    • Run connectivity tests using curl -k -v https://<HCX_Manager_FQDN>
    • Check for routing issues with traceroute <HCX_Cloud_Manager_IP>
    • Verify DNS resolution and NTP synchronization
  2. For detailed troubleshooting of service failures, refer to KB article: HCX Manager postgresdb service fails to start after upgrade
  3. Log into the affected HCX manager via the management interface (https://HCX-Manager-IP:9443) and verify system health status.
  4. Examine service logs for each critical component:
     
    grep "app-engine" /var/log/messages
    grep "web-engine" /var/log/messages
    grep "appliance-management" /var/log/messages
     
  5. Check disk space usage on the HCX Manager, particularly for the /common directory:
     
    df -h
  6. Restart each of the HCX services on the affected HCX Manager:
     
    systemctl restart app-engine
    systemctl restart web-engine
    systemctl restart appliance-management
    systemctl restart postgresdb
    systemctl restart zookeeper
    systemctl restart kafka
     
  7. If the issue persists after restarting services, perform a full reboot of the HCX Manager appliance: reboot
  8. Upgrade both source and destination HCX components to the latest supported version (preferably 4.10 or 4.11+) to prevent recurrence.
  9. Implement scheduled maintenance with periodic reboots of HCX Managers (at least a few times a year) to prevent UI synchronization issues.
  10. Monitor HCX Manager services via port 9443 for service status to proactively identify potential issues.
  11. Ensure version alignment between source and destination HCX components to prevent compatibility issues. Version mismatches should be eliminated by upgrading both sides to the same current version.

Note: Broadcom support will recommend that you update to the latest version of HCX if you are hitting this issue. You should proactively upgrade to resolve this. Only create a support case if this issue continues to occur on the latest version of HCX.

Restarting services or rebooting the appliance resolves the issue temporarily, but upgrading to current supported versions is the permanent solution. Most cases can be resolved by implementing a regular maintenance schedule that includes periodic reboots and monitoring disk space usage. For comprehensive troubleshooting, timestamps of issue occurrence and service restarts are essential.

If the error persists after following these steps, contact Broadcom Support for further assistance.

Please provide the below information when opening a support request with Broadcom for this issue:

  • Both source and destination HCX versions
  • HCX Manager uptime statistics for both sides of the connection
  • Screenshots of the error as seen via UI
  • Timestamps of when the issue was first observed
  • Source and Target HCX log bundles with HCX database dumps and IX appliance selected

Additional Information

For detailed site pairing diagnostics, refer to: HCX Site Pairing Connectivity Diagnostic Steps

For issues related to disk space and /common directory utilization, refer to: HCX Connector or Cloud Manager unresponsive due to high utilization of "/common" directory

For service status troubleshooting, refer to: HCX Manager postgresdb service fails to start after upgrade