VM replication initial sync stuck for a long amount of time
search cancel

VM replication initial sync stuck for a long amount of time

book

Article ID: 400452

calendar_today

Updated On:

Products

VMware Live Recovery

Issue/Introduction

Symptoms:

  • VM replication stuck at initial sync for long duration. We see the state of the VM replication stuck on the vSphere replication appliance under 'Replication'.

  • The VM is newly added to Replication and initial sync is running on it.

Environment

vSphere Replication 9.x

Cause

This issue is caused due to network communication issues between the ESXi host holding the VM to the target vSphere replication server in legacy replication and the target ESXi host in case of enhanced replication

Cause Validation:

In case of enhanced replication, below events will be observed in the source ESXi host /var/log/vmkernel.log file

2026-02-19T06:33:27.3802 Wa (180) vmkwarning: cpull: 6507409) WARNING: Hbr: 788: Failed to receive from 127. 0.0.1 (groupID-GID-672ed7f0-7066-####-####-############) : Broken pipe

The /var/log/hbr-agent.log file on the source ESXi host reveals that it is unable to write the data to the target ESXi host and the connections are timing out

2026-02-19T06:33:27.3972 In (166) hbr-agent-bin[6235489] : [0x000000aac94b1700] info: [Proxy [Group: GID-672ed7f0-7066-####-####-############] -> [10.#.#.#: 32032]] Bound to vmk: vmk2 for connection to 10.#.#.#: 32032
2026-02-19T06:33:27.397Z In(166) hbr-agent-bin[6235489]: [0x000000aac9532700] info: [Proxy [Group: GID-672ed7f0-7066-####-####-############] -> [10.#.#.#: 32032]] TCP Connect latency was 505us
2026-02-19T06:35:43.433Z In (166) hbr-agent-bin[6235489]: [0x000000aac9532700] error: [Proxy [Group: GID-672ed7f0-7066-####-####-############] -> [10.#.#.#: 32032] ] Failed to write to server: Broken pipe
2026-02-19T06:35:43.434Z In(166) hbr-agent-bin[6235489]: [0x000000aac9430700] error: [Proxy [Group: GID-672ed7f0-7066-####-####-############] -> [10.#.#.#: 32032]] Failed to read from server: Connection timed out

In addition to this, validating the connectivity between the ESXi host and VR appliance (legacy replication) or between the source and target ESXi host (enhanced replication) reveals packet loss.

Syntax: vmkping -I vmk# -s MTU -d <Destination Appliance IP Address>

Example 1: In case replicating vmkernel interface is using MTU 9000, run vmkping using the below example format and check if there is a packet loss. 

vmkping -I vmk2 -s 8972 -d <Destination Appliance IP Address>
PING 10.#.#.# (10.#.#.#): 8972 data bytes

- 10.#.#.# ping statistics
3 packets transmitted, 0 packets received, 100% packet loss

Example 2: In case replicating vmkernel interface is using MTU 1500 run vmkping using the below example format and check if there is a packet loss. 

vmkping -I vmk2 -s 1472 -d <Destination Appliance IP Address>
PING 10.#.#.# (10.#.#.#): 1472 data bytes

- 10.#.#.# ping statistics
3 packets transmitted, 0 packets received, 100% packet loss

Resolution

  • Engage your networking team and address the underlying network issues leading to the packet drops and also ensure that the MTU is uniform across all the components.

  • Once the networking issues are fixed, initial sync should proceed without any issue and replication should work as expected.