vSphere Replication Synchronization Stalled at 50% with "No connection to VR Server" Error
search cancel

vSphere Replication Synchronization Stalled at 50% with "No connection to VR Server" Error

book

Article ID: 441701

calendar_today

Updated On:

Products

VMware Live Recovery

Issue/Introduction

Symptoms:

When performing a failback or configuring replication from a Disaster Recovery (DR) site to a Production (PR) site, the synchronization process becomes stuck at 50%. The vCenter Server or Site Recovery Manager (SRM) UI displays the following error:

> "A replication error occurred at the vSphere Replication Server for replication '[VM_NAME]'. Details: No connection to VR Server for virtual machine [VM_NAME] on host [ESXi_HOSTNAME] in cluster [CLUSTER_NAME] in [DATACENTER]: Not responding."

Additional Observations:

  • The Add-on or Replication Server VAMI is accessible, but login attempts fail.
  • VM states on the host fluctuate between Normal and Warning from direct ESXi UI.
  • The ESXi host may show high uptime (e.g., 190+ days).
  • VM operations (e.g., Power Off) for VMs on the affected host fail or time out.
  • VM consoles for the replication server and other VMs on the same ESXi host are inaccessible via vCenter or ESXi UI.

Environment

Vsphere Replication 9.x

Cause

This issue is caused by ESXi host instability at the target (Production) site. When the host running the vSphere Add-on or Replication Server becomes unresponsive or unstable, the TCP communication required for synchronization (typically on ports 31031 or 8123) is interrupted.

Resolution

To restore replication functionality, the Add-on Replication Server must be moved to a stable environment.

  1. Stabilize the ESXi Host
    • Attempt to place the affected ESXi host into Maintenance Mode to evacuate remaining VMs.
    • If the host is completely unresponsive to management commands, a hard reboot of the physical host may be required (after ensuring VMs are protected or migrated).
  2. Migrate the Replication Appliance
    • Migrate the Add-on  or Replication Server VM  to a known healthy ESXi host within the cluster.
    • Verify that the appliance is responsive by checking the VM console and the VAMI interface.
  3. Validate Services and Connectivity
    • Log in to the vSphere Replication Appliance VAMI and ensure all services are in a Running state.
    • Ensure that port 31031/32032 (for replication traffic) and 8123 (for management) are open and not blocked by physical or virtual firewalls between the DR and PR sites.
  4. Resume Replication
    • Once the environment is stabilized, manually trigger a Synchronize Now task for the affected VM.
    • Monitor the progress. Note that for very large VMs (e.g., ~2.83 TB), the sync may take significant time to progress past 50% depending on your network bandwidth and latency.

Recommendations:

  • Upgrade Infrastructure: Ensure ESXi hosts and vSphere Replication components are upgraded to supported versions (e.g., vSphere 8.x) to maintain alignment with Broadcom support policies.
  • Network Optimization: Investigate and optimize network latency for replication traffic to reduce the duration of synchronization windows.
  • Resource Allocation: Verify that ESXi hosts have sufficient CPU and memory resources to handle the high I/O demands of large-scale replication workloads.

Additional Information

Contributing factors may include:

  • High Host Uptime: Leading to abnormal host behavior and management service exhaustion.
  • End-of-General Support (EOGS) Hardware/Software: Running legacy versions (e.g., ESXi 7.0 U3 or VR 8.8.x) without extended support.
  • Environmental Latency: High network latency can exacerbate synchronization delays for large disk workloads, making them more sensitive to minor host instabilities.