VR Synchronization Failures in SRM Recovery Plan: Connectivity Issues Between ESXi Hosts and vSphere Replication Server
search cancel

VR Synchronization Failures in SRM Recovery Plan: Connectivity Issues Between ESXi Hosts and vSphere Replication Server

book

Article ID: 389169

calendar_today

Updated On:

Products

VMware Live Recovery

Issue/Introduction

Symptoms:

  • Replication status shows "RPO violation" for affected VMs.
  • CPU and memory usage on the vSphere Replication appliance increases noticeably.

 

 

 

Environment

VMware vSphere Replication 8.x
VMware vSphere Replication 9.x
VMware Site Recovery Manager 8.x
VMware Site Recovery Manager 9.x

Cause

  • During startup, the vSphere Replication Server service (hbrsrv) attempts to establish connections with all ESXi hosts in the vCenter Server inventory. In large-scale environments, if even a single host is unreachable, hbrsrv may perform prolonged connection retries, significantly delaying the service startup process. For detailed information on this behavior, refer to VMware’s deployment documentation:   Deploy the vSphere Replication Virtual Appliance.
  • Firewalls or security policies blocking port 443 between the VR appliance and ESXi hosts can lead to failed connections.
  • If a host is removed from inventory while hbrsrv is offline, its stale entry in the hbrsrv database may cause connection failures during the next startup.
  • When new hosts are introduced, hbrsrv probes them. If network communication is not properly configured, these probes fail and can crash or hang the service.
  • Stale or incorrectly updated ARP tables can further complicate connectivity, causing unresolved IP-to-MAC mapping and dropped packets.
  • A review of the /opt/vmware/support/logs/hbrsrv.log entries reveals connection failures to the newly added ESXi hosts. The logs indicate multiple failed attempts to establish connections to these hosts over port 443.

2025-02-21T11:18:30.654Z verbose hbrsrv[633210] [Originator@6876 sub=IO.Connection opID=ae2e129d-b8bc-48dc-82f8-fd931a4f2c8f-HMSINT-12775318] Attempting connection; <resolver p:0x00007fefb404a000, '10.xxx.xx.xx:443', next:(null)>, last e: 111(Connection refused)
2025-02-21T11:18:30.654Z warning hbrsrv[633210] [Originator@6876 sub=HttpConnectionPool-000000 opID=ae2e129d-b8bc-48dc-82f8-fd931a4f2c8f-HMSINT-12775318] Failed to get pooled connection; <cs p:00007fefe80f3a10, TCP:10.xxx.xx.xx:443>, (null), duration: 0msec, N7Vmacore15SystemExceptionE
(Connection refused: The remote service is not running, OR is overloaded, OR a firewall is rejecting connections.)

These entries confirm that the hbrsrv service is unable to maintain communication with specific ESXi hosts, impacting both heartbeat and replication traffic.

  • Review of /opt/vmware/hms/logs/hms.log file shows repeated failures when attempting to enable replication support for newly added ESXi hosts:

2025-02-21 06:20:08.626 ERROR com.vmware.hms.monitor.hostEnableHostAtHbrTaskRunner [hms-main-thread-10672] (..monitor.host.EnableHostOnHbrHelper) [operationID=ae2e129d-b8bc-48dc-82f8-fd931a4f2c8f-HMSINT-12775318] | Failed to enable esxi-host(host-1xxx) for addresses [10.xxx.xx.xx], using NICs [management.key-vim.host.VirtualNic-vmk0].
2025-02-21 06:20:08.626 ERROR com.vmware.hms.monitor.hostEnableHostAtHbrTaskRunner [hms-main-thread-10672] (..jvsl.util.Slf4jUtil) [operationID=ae2e129d-b8bc-48dc-82f8-fd931a4f2c8f-HMSINT-12775318] | Failed to enable host esxi-host(host-1xxx) on any NIC in VR server vreplication(52a87983-e639-2683-7062-acc1eb6b5e1a).
2025-02-21 06:20:08.626 ERROR com.vmware.hms.monitor.hostEnableHostAtHbrTaskRunner [hms-main-thread-10672] (..host.task.EnableHostAtHbrTaskRunner) [operationID=ae2e129d-b8bc-48dc-82f8-fd931a4f2c8f-HMSINT-12775318] | Error while enabling host host-1xxx in VR Server 10.xxx.xx.xx

These errors suggest that the issue lies with network connectivity, preventing the VR server from establishing communication with the target host.

Resolution

To address and resolve the issue, perform the following actions:

  • Verify Network Connectivity

    • Ensure that all ESXi hosts—especially recently added ones—can communicate with the vSphere Replication appliance over port 443.

    • Use tools like ping, nc, or curl from the VR server to test connectivity to target ESXi hosts.

  • Check Firewall and Security Policies

    • Ensure that firewalls between the vSphere Replication appliance and ESXi hosts are not blocking connections.

  • Inspect ARP Tables

    • Refresh or flush ARP tables on both the vSphere Replication appliance and the connected switches/routers if stale entries are suspected.

  • Remove Stale Entries

    • If a host was removed while the VR service was down, consider manually cleaning up stale host database entries (consult Broadcom Support before making any direct DB changes).