Virtual Machine Replication gets stuck at 99% and doesn't complete.
search cancel

Virtual Machine Replication gets stuck at 99% and doesn't complete.

book

Article ID: 399121

calendar_today

Updated On:

Products

VMware Live Recovery

Issue/Introduction

Symptoms:

The VM replication proceeds till 99% and tasks fail with an errors.

A replication error occurred at the vSphere Replication Server for replication '###'. Details: 'No connection to VR Server for virtual machine ### on host ##-##-##.local in cluster ###Cluster in ####: Not responding'.

/var/run/log/vmkernel.log on the Source Host

2025-04-23T15:15:50.789Z cpu82:135659421)WARNING: Hbr: 893: Failed to receive from ##.##.##.## (groupID=GID-####_###_###): Timeout

2025-04-23T15:17:05.807Z cpu48:135659421)WARNING: Hbr: 574: Connection failed to ##.##.##.##  (groupID=GID-####_###_###): Timeout

2025-04-23T15:17:05.807Z cpu48:135659421)WARNING: Hbr: 5093: Failed to establish connection to [##.##.##.## ]:31031 (groupID=GID-####_###_###): Timeout

2025-04-23T15:18:30.814Z cpu72:135659421)WARNING: Hbr: 574: Connection failed to ##.##.##.##  (groupID=GID-####_###_###): Timeout

/var/log/vmware/hbrsrv.log on the target replication appliance.

2025-04-24T14:12:14.065Z verbose hbrsrv[2745913] [Originator@6876 sub=PropertyProvider] RecordOp ASSIGN: groupStats, Hbr.Replica.Group.GID-####_###_###. Applied change to temp map.

2025-04-24T14:12:14.852Z error hbrsrv[2565649] [Originator@6876 sub=Asio] Cancel  ->  (plain text) due to timeout: Timed-out sync-reading request

2025-04-24T14:12:14.852Z info hbrsrv[2565649] [Originator@6876 sub=Delta] ClientConnection (ClientCnx '[##.##.##.## ]:49152' id=1 <shut>) is stopping ...

2025-04-24T14:12:14.852Z info hbrsrv[2565649] [Originator@6876 sub=Asio] Closing LWD ASIO  ->  (plain text)

2025-04-24T14:12:14.852Z info hbrsrv[2565649] [Originator@6876 sub=Delta] HbrSrv cleaning out ClientConnection ([##.##.##.## ]:49152)

2025-04-24T14:12:14.852Z error hbrsrv[2565649] [Originator@6876 sub=Main] HbrError stack:

2025-04-24T14:12:14.852Z error hbrsrv[2565649] [Originator@6876 sub=Main]    [0] ClientConnection (client=[##.##.##.## ]:49152) request callback failed: Operation was canceled

2025-04-24T14:12:14.852Z error hbrsrv[2565649] [Originator@6876 sub=Main]    [1] Dropping error encountered from network

We notice the IPS Blocking the replication traffic.

Environment

VMware vSphere Replication 8.x
VMware vSphere Replication 9.x
VMware ESXi 6.x
VMware ESXi 7.x
VMware ESXi 8.x

Cause

This is an indication of a network communication issue between the ESXi host and the vSphere Replication (VR) Server. 


Replication ports are blocked, most likely due to underlying environmental issues with the network or firewall.

Resolution

To troubleshoot this issue:

We should be able to ping the source ESXi host and the target replication appliance & ESXi Hosts.

Check if ESXi is able to communicate with the the VRMS using netcat command 

Syntax for netcat

nc -z IP_Address PortNumber.

Validate the ports required for replication

Ensure that the firewall rules are not blocking the ports required for communication with the remote VRMS

vSphere Replication - VMware Ports and Protocols

https://ports.broadcom.com/home/vSphere-Replication

Additional Information

We should also validate these steps 

Identify and resolve any resource conflicts between the replication process.

Verify adequate network bandwidth between the source and destination servers. Slow network speeds can hinder replication progress. 

Verify network settings on both the source and destination servers, including IP addresses, subnet masks, and gateway.