Virtual machine replication is slow using vSphere Replication
search cancel

Virtual machine replication is slow using vSphere Replication

book

Article ID: 389470

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

Symptoms:

  • Virtual machine replication is taking a long time to finish with no error displayed.

  • Regardless of whether legacy replication or enhanced replication is used, any VMs configured for replication experience slowness.
  • From the /var/run/log/hostd.log, the replication bandwidth will report low, which confirms slow data transfer.

2026-04-01T11:10:23.131Z In(166) Hostd[2100761]: [Originator@6876 sub=Hbrsvc] ReplicationScheduler: starting replica for group GID-4388655c-a5fb-47a7-af2a-########. Estimated maximum duration: 18907 seconds. Estimated average bandwidth: 0.28 MB/s.

  • If a recovery plan is executed while a virtual machine is in an active synchronization state, the recovery plan may fail due to the replication status being incomplete.
  • Transferred data increases very slowly:
    • 11.6 MB --> 12.4 MB --> 13.6 MB --> 14.9 MB

This confirms:

  • Replication is working but speed is low due to network issues

[root@esxihost:/] vim-cmd hbrsvc/vmreplica.getState 512
Retrieve VM running replication state:
        The VM is configured for replication. Current replication state: Group: GID-4388655c-a5fb-47a7-af2a-########(generation=40505691886901292)
        Group State: lwd delta (instanceId=replica-521b639a-515f-4391-191d-467ac0a2dee8) (0% done: transferred 11.6 MB of 53.3 GB)
                DiskID RDID-97f394bb-4906-4691-96c0-9a0a00d2d5c6 State: lwd delta (transferred 2.0 MB of 7.6 GB)
                DiskID RDID-d071535c-0705-459a-ac23-49bb43b2494b State: lwd delta (transferred 1.5 MB of 45.7 GB)
                DiskID RDID-4a586531-09c5-4997-b1a8-11225d07fe9f State: lwd delta (transferred 8.1 MB of 18.6 MB)

[root@esxihost:/] vim-cmd hbrsvc/vmreplica.getState 512
Retrieve VM running replication state:
        The VM is configured for replication. Current replication state: Group: GID-4388655c-a5fb-47a7-af2a-########(generation=40505691886901292)
        Group State: lwd delta (instanceId=replica-521b639a-515f-4391-191d-########) (0% done: transferred 12.4 MB of 53.3 GB)
                DiskID RDID-97f394bb-4906-4691-96c0-######## State: lwd delta (transferred 2.0 MB of 7.6 GB)
                DiskID RDID-d071535c-0705-459a-ac23-######## State: lwd delta (transferred 1.8 MB of 45.7 GB)
                DiskID RDID-4a586531-09c5-4997-b1a8-######## State: lwd delta (transferred 8.7 MB of 18.6 MB)

[root@esxihost:/] vim-cmd hbrsvc/vmreplica.getState 512
Retrieve VM running replication state:
        The VM is configured for replication. Current replication state: Group: GID-4388655c-a5fb-47a7-af2a-########(generation=40505691886901292)
        Group State: lwd delta (instanceId=replica-521b639a-515f-4391-191d-########) (0% done: transferred 13.6 MB of 53.3 GB)
                DiskID RDID-97f394bb-4906-4691-96c0-######## State: lwd delta (transferred 2.0 MB of 7.6 GB)
                DiskID RDID-d071535c-0705-459a-ac23-######## State: lwd delta (transferred 2.3 MB of 45.7 GB)
                DiskID RDID-4a586531-09c5-4997-b1a8-######## State: lwd delta (transferred 9.3 MB of 18.6 MB)

[root@esxihost:/] vim-cmd hbrsvc/vmreplica.getState 512
Retrieve VM running replication state:
        The VM is configured for replication. Current replication state: Group: GID-4388655c-a5fb-47a7-af2a-########(generation=40505691886901292)
        Group State: lwd delta (instanceId=replica-521b639a-515f-4391-191d-467ac0a2dee8) (0% done: transferred 14.9 MB of 53.3 GB)
                DiskID RDID-97f394bb-4906-4691-96c0-######## State: lwd delta (transferred 2.6 MB of 7.6 GB)
                DiskID RDID-d071535c-0705-459a-ac23-######## State: lwd delta (transferred 2.5 MB of 45.7 GB)
                DiskID RDID-4a586531-09c5-4997-b1a8-######## State: lwd delta (transferred 9.8 MB of 18.6 MB)

Validation:

  • From the source ESXi host /var/run/log/vmkernel.log, multiple connection failure and timeout events will be reported.

2026-03-31T13:39:17.344Z Wa(180) vmkwarning: cpu25:41172557)WARNING: Hbr: 571: Connection failed to target_vr (groupID=GID-4388655c-a5fb-47a7-af2a-########): Connection refused
2026-03-31T13:39:17.344Z Wa(180) vmkwarning: cpu25:41172557)WARNING: Hbr: 5362: Failed to establish connection to [target_vr]:31031 (groupID=GID-4388655c-a5fb-47a7-af2a-########): Connection refused

  • The /var/log/vmware/hbrsrv.log on the Target VR indicates a network-related issue.

2025-01-28T12:16:30.108+05:30 verbose hbrsrv[2748298] [Originator@6876 sub=PropertyProvider] RecordOp ASSIGN: serverStats, HbrServer. Applied change to temp map.
2025-01-28T12:16:32.365+05:30 info hbrsrv[2748277] [Originator@6876 sub=Delta] ClientConnection (ClientCnx '[Y.Y.Y.Y]:55936' id=3 <shut>) is stopping ...
2025-01-28T12:16:32.365+05:30 info hbrsrv[2748277] [Originator@6876 sub=Asio] Closing LWD ASIO  ->  (plain text)
2025-01-28T12:16:32.365+05:30 info hbrsrv[2748277] [Originator@6876 sub=Delta] HbrSrv cleaning out ClientConnection ([Y.Y.Y.Y]:55936)
2025-01-28T12:16:32.365+05:30 error hbrsrv[2748277] [Originator@6876 sub=Main] HbrError stack:
2025-01-28T12:16:32.365+05:30 error hbrsrv[2748277] [Originator@6876 sub=Main]    [0] ClientConnection (client=[Y.Y.Y.Y]:55936) request callback failed: Connection reset by peer: The connection is terminated by the remote end with a reset packet. Usually, this is a sign of a network problem,  timeout, or service overload.
2025-01-28T12:16:32.365+05:30 error hbrsrv[2748277] [Originator@6876 sub=Main]    [1] Dropping error encountered from network

(Note: In above logs Y.Y.Y.Y is the Source ESXi vmkernel where replication traffic is enabled)

 

Environment

  • vSphere Replication 8.x
  • vSphere Replication 9.x

Cause

This can be caused by following reasons:

  1. Poor replication bandwidth allocation between sites
  2. Network configuration of external switches, routers, firewall and WAN appliances.
  3. Network performance is poor or inconsistent.

Resolution

Engage the physical switch vendor to identify network bottlenecks affecting replication performance, ensure stable connectivity with sufficient bandwidth between source and target sites, and check for congestion or packet loss.

To perform packet captures, follow this article: Using the pktcap-uw tool in ESXi 5.5 and later (341568)

Following commands can be used to capture packets:

  1. To capture packets on the uplink vmnic of source ESXi host where the VM is running:  

    # pktcap-uw --uplink vmnic --dir 2 -o /vmfs/volumes/Datastore_name/vmnic.pcap

  1. To capture packets on the VMkernel interface used for replication traffic: 

    # pktcap-uw --vmk vmk0 --dir 2 -o /vmfs/volumes/Datastore_name/vmk2.pcap

  1. To capture packets on the network adapter of the DR site replication appliance:

    # tcpdump -i eth0 -w /tmp/eth0.pcap 

These packet captures can be analyzed by the physical switch vendor for potential issues on the network such as out of order packets or tcp reset events.