VMware vSphere Replication 8.x
VMware vSphere Replication 9.x
VMware Site Recovery Manager 8.x
VMware Site Recovery Manager 9.x
hbrsrv) attempts to establish connections with all ESXi hosts in the vCenter Server inventory. In large-scale environments, if even a single host is unreachable, hbrsrv may perform prolonged connection retries, significantly delaying the service startup process. For detailed information on this behavior, refer to VMware’s deployment documentation: Deploy the vSphere Replication Virtual Appliance.hbrsrv is offline, its stale entry in the hbrsrv database may cause connection failures during the next startup.hbrsrv probes them. If network communication is not properly configured, these probes fail and can crash or hang the service./opt/vmware/support/logs/hbrsrv.log entries reveals connection failures to the newly added ESXi hosts. The logs indicate multiple failed attempts to establish connections to these hosts over port 443.2025-02-21T11:18:30.654Z verbose hbrsrv[633210] [Originator@6876 sub=IO.Connection opID=ae2e129d-b8bc-48dc-82f8-fd931a4f2c8f-HMSINT-12775318] Attempting connection; <resolver p:0x00007fefb404a000, '10.xxx.xx.xx:443', next:(null)>, last e: 111(Connection refused)2025-02-21T11:18:30.654Z warning hbrsrv[633210] [Originator@6876 sub=HttpConnectionPool-000000 opID=ae2e129d-b8bc-48dc-82f8-fd931a4f2c8f-HMSINT-12775318] Failed to get pooled connection; <cs p:00007fefe80f3a10, TCP:10.xxx.xx.xx:443>, (null), duration: 0msec, N7Vmacore15SystemExceptionE(Connection refused: The remote service is not running, OR is overloaded, OR a firewall is rejecting connections.)
These entries confirm that the hbrsrv service is unable to maintain communication with specific ESXi hosts, impacting both heartbeat and replication traffic.
/opt/vmware/hms/logs/hms.log file shows repeated failures when attempting to enable replication support for newly added ESXi hosts:2025-02-21 06:20:08.626 ERROR com.vmware.hms.monitor.hostEnableHostAtHbrTaskRunner [hms-main-thread-10672] (..monitor.host.EnableHostOnHbrHelper) [operationID=ae2e129d-b8bc-48dc-82f8-fd931a4f2c8f-HMSINT-12775318] | Failed to enable esxi-host(host-1xxx) for addresses [10.xxx.xx.xx], using NICs [management.key-vim.host.VirtualNic-vmk0].2025-02-21 06:20:08.626 ERROR com.vmware.hms.monitor.hostEnableHostAtHbrTaskRunner [hms-main-thread-10672] (..jvsl.util.Slf4jUtil) [operationID=ae2e129d-b8bc-48dc-82f8-fd931a4f2c8f-HMSINT-12775318] | Failed to enable host esxi-host(host-1xxx) on any NIC in VR server vreplication(52a87983-e639-2683-7062-acc1eb6b5e1a).2025-02-21 06:20:08.626 ERROR com.vmware.hms.monitor.hostEnableHostAtHbrTaskRunner [hms-main-thread-10672] (..host.task.EnableHostAtHbrTaskRunner) [operationID=ae2e129d-b8bc-48dc-82f8-fd931a4f2c8f-HMSINT-12775318] | Error while enabling host host-1xxx in VR Server 10.xxx.xx.xx
These errors suggest that the issue lies with network connectivity, preventing the VR server from establishing communication with the target host.
To address and resolve the issue, perform the following actions:
Verify Network Connectivity
Ensure that all ESXi hosts—especially recently added ones—can communicate with the vSphere Replication appliance over port 443.
Use tools like ping, nc, or curl from the VR server to test connectivity to target ESXi hosts.
Check Firewall and Security Policies
Ensure that firewalls between the vSphere Replication appliance and ESXi hosts are not blocking connections.
Inspect ARP Tables
Refresh or flush ARP tables on both the vSphere Replication appliance and the connected switches/routers if stale entries are suspected.
Remove Stale Entries
If a host was removed while the VR service was down, consider manually cleaning up stale host database entries (consult Broadcom Support before making any direct DB changes).