Enhanced Replication Mappings fails with the error "Fault occurred while performing health check. Details 'IO: Connection timed out'"

search cancel

Enhanced Replication Mappings fails with the error "Fault occurred while performing health check. Details 'IO: Connection timed out'"

book

Article ID: 395659

calendar_today

Updated On:

Products

VMware Live Recovery

Issue/Introduction

Symptoms

Enhanced Replication Mapping fails specifically for all connections to a single target ESXi host.
Error observed:
“Fault occurred while performing health check. Details: ‘IO: Connection timed out.’”

Validation Steps:

Confirm that port 32032 is open bidirectionally on the affected ESXi host by running nc -zv <target host IP> 32032
Verify that vSphere Replication services are enabled on only one vmkernel adapter, as required.
This can be checked by navigating to: Hosts > Configure > Networking > VMkernel adapters
Confirm that the hbr-agent service is updated and running on the ESXi host:

To check the service status navigate to:
Hosts > Configure > System > Services > hbr-agent

To verify the version of the hbr VIB installed, run the following command on the ESXi host:

esxcli software vib list | grep hbr

Sample output:
esxcli software vib list | grep hbr
vmware-hbr-agent 8.0.3-0.0.24299506 VMware VMwareCertified 2025-01-02 host
vmware-hbrsrv 8.0.3-0.0.23305546 VMware VMwareCertified 2025-01-02 host

Environment

vSphere Replication 9.x

Cause

The issue is caused by a faulty uplink associated with the vmkernel adapter used for vSphere Replication on the target ESXi host.

Cause Validation:

From the hbr-agent.log file on the target ESXi host, multiple connection timeout errors can be observed:

2025-03-18T09:01:00.999Z In(166) hbr-agent-bin[2101419] : [0x000000c6cf060700] info: [Proxy [Group: PING-GID-92xxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx] -> [10.##.##.##: 32032] ] [db46a94a-3091-4da4-bb25-68a5a6e8baf0] Bound to vmk: vmkl for connection to 10.##.##.##:32032
2025-03-18T09:02:16.012Z In(166) hbr-agent-bin[2101419] : [0x000000c6ceedd700] error: [Proxy [Group: PING-GID-92xxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx] -> [10.##.##.##: 32032]] [db46a94a-3091-4da4-bb25-68a5a6e8baf0] Failed to connect to 10.##.##.##:32032. Using nic 'vmkl', Error: Connection timed out
2025-03-18T09:02:16.013Z In(166) hbr-agent-bin[2101419]: [0x000000c6ceedd700] error: [Proxy [Group: PING-GID-92xxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx] -> [10.##.##.##:32032]] [db46a94a-3091-4da4-bb25-68a5a6e8baf0] Failed to connect to broker on 10.##.##.##:32032: Input/output error

When performing a vmkping from the target ESXi host using vmk1 (used for vSphere replication) to the source ESXi host, packet loss are observed:

vmkping -I vmk1 -s 1472 10.##.##.##
PING 10.##.##.## (10.##.##.##) : 1472 data bytes
1480 bytes from 10.##.##.##: icmp_seq=1 ttl=52 time=32.160 ms
1480 bytes from 10.##.##.##: icmp_seq=2 ttl=52 time=31.278 ms

--- 10.##.##.## ping statistics ---
3 packets transmitted, 2 packets received, 33.3333% packet loss

In contrast, vmkping using vmk0 (used for management) shows no packet loss:

vmkping -I vmk0 -s 1472 10.##.##.##
PING 10.100.36.124 (10.##.##.##): 1472 data bytes
1480 bytes from 10.##.##.##: icmp_seq=0 ttl=53 time=29.826 ms
1480 bytes from 10.##.##.##: icmp_seq-1 ttl-53 time-28.464 ms
1480 bytes from 10.##.##.##: icmp_seq-2 ttl-53 time=27.541 ms

--- 10.100.36.124 ping statistics ---
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max - 27.541/28.610/29.826 ms

Using esxtop command on the target ESXi host and pressing letter n to view network statistics, it is observed:
• vmk0 is mapped to vmnic0
• vmk1 is mapped to vmnic1

After bringing down vmnic1, the traffic from vmk1 fails over to vmnic0. Post-failover, vmkping from vmk1 to the source ESXi host shows 0% packet loss, confirming the issue with vmnic1

Resolution

This issue can be resolved by bringing the faulty uplink down and then bringing it back up.

Commands to be used

esxcli network nic down -n vmnic#

esxcli network nic up -n vmnic#

If the issue persists after following the above steps, engage your hardware vendor to check the health of the physical network card

Feedback

thumb_up Yes

thumb_down No