Validation Steps :
Validated the Status of the VM replication using below commands in ESXI host where that VM resides
vim-cmd vmsvc/getallvms (Make a note of VMID from this command)vim-cmd hbrsvc/vmreplica.getState VMID [root@auh--esxi-:/vmfs/volumes/64abe83a-####-c40d-#####/log] vim-cmd hbrsvc/vmreplica.queryReplicationState 31 Querying VM running replication state: Current replication state: State: active Instance ID: replica-####-ec05-46c6-###-##### Progress: 0% (transfer: 5195923456/585975693312) [root@auh-vxr-esxi-05:/vmfs/volumes/64abe83a-####-###-5c6f69dcf2f0/log]
Validate hbr-agent.log and check for below entries
Log path : Source ESXI host : less /var/run/log/hbr-agent.log
2025-04-25T08:17:30.534Z In(166) hbr-agent-bin[2103093]: [0x000000e7c7d5c700] info: [Proxy [Group: GID-###-f27b-###-a033-####5457c6] -> [172.##.251.##:32032]] TCP Connect latency was 4482µs 2025-04-25T08:22:41.571Z In(166) hbr-agent-bin[2103093]: [0x000000e7c7bd9700] error: [Proxy [Group: GID-####aa-f27b-###-a033-####457c6] -> [##.18.##.#15:32032]] Failed to read from server: End of file
2025-04-25T08:24:25.039Z In(166) hbr-agent-bin[ 2103093]: [0x000000e7c7cdb700] info: [Proxy [Group: GID-####aa-f27b-####-a033-######c6] -> [1##.18.2##.54:32032]] Setting up secure tunnel to brokered server 1##.##.##1.1##:3203 2 (1 of 1) 2025-04-25T08:24:25.039Z In(166) hbr-agent-bin[2103093]: [0x000000e7c7cdb700] info: [Proxy [Group: GID-####a-f27b-4e8e-a###-####7c6] -> [172.##.##.##:32032]] Bound to vmk: vmk2 for connection to 1##.18.2##.1##:32032 2025-04-25T08:24:25.042Z In(166) hbr-agent-bin[2103093]: [0x000000e7c7bd9700] info: [Proxy [Group: GID-ab####a-f27b-###-a033-######57c6] -> [172.##.####.###:32032]] TCP Connect latency was 3160µs 2025-04-25T08:24:41.407Z In(166) hbr-agent-bin[2103093]: [0x000000e7c7bd9700] error: [Proxy [Group: GID-###a-f27b-4e8e-###-####57c6] -> [172.##.###.##:32032]] Failed to read from client: Connection reset by peer 2025-04-25T08:24:41.407Z In(166) hbr-agent-bin[2103093]: [0x000000e7c7c5a700] error: [Proxy [Group: GID-a####aa-f27b###-a033###57c6] -> [172.##.2##.1##:32032]] Failed to read from server: Operation canceled
VMware Live Recovery 9.0.2
From the source ESXi host, when attempting to use vmk2 with an MTU of 1500 (payload size 1472), there is 100% packet loss.
However, the same test using a reduced MTU (payload size 1072) completes successfully with no packet loss. This indicates there is a lag in network and its causing the slowness while replication data using MTU 1500.
Cause Validation
Network communication between the source and destination ESXi hosts over port 32032 (used by vSphere Replication) is broken when using MTU 1500.
[root@auh-vxr-esxi:/vmfs/volumes/64abe83a-####-###-5c6f69d####/log] vmkping -I vmk2 1##.##.##.### -d -s 1472 PING 1#2.##.##.##5 (1##.##.2##.##5): 1472 data bytes 172.18.251.115 ping statistics --- 3 packets transmitted, 0 packets received, 100% packet loss
Successful connectivity is only observed when MTU is reduced, which suggests fragmentation or intermediate network device issues (e.g., firewall, switch, or load balancer).
Whereas same ping is working fine with MTU 1072
[root@auh-esxi:/vmfs/volumes/64abe83a-####-c40d-######/log] vmkping -I vmk2 ###.##.##.## -d -s 1072PING 1##.18.##1.##5 (1##.##.##1.##5): 1072 data bytes1080 bytes from ##2.18.##1.##5: icmp_seq=0 ttl=60 time=3.626 ms1080 bytes from ##2.18.##1.##5: icmp_seq=1 ttl=60 time=5.265 ms1080 bytes from ##2.18.##1.##5: icmp_seq=2 ttl=60 time=4.380 ms
--- 172.18.251.115 ping statistics ---3 packets transmitted, 3 packets received, 0% packet loss
Coordinate with the network team to investigate and resolve the MTU mismatch or path MTU discovery issue between the source and destination ESXi hosts.
Specifically, determine why packets using MTU 1500 (payload size 1472) are being dropped or not routed correctly.
Check for potential misconfigurations, MTU limitations, or faulty intermediate devices (e.g., switches, routers, or firewalls) affecting the replication traffic path.
Ensure consistent MTU settings end-to-end and enable jumbo frame support if required for optimal replication performance.