vSphere Replication HBR service takes a long time to start
search cancel

vSphere Replication HBR service takes a long time to start

book

Article ID: 312582

calendar_today

Updated On:

Products

VMware Live Recovery VMware vSphere ESXi

Issue/Introduction

Symptoms:

  • vSphere Replication HBR service takes a long time to start.
  • ​In /var/log/vmware/hbrsrv.log you see entries similar to:
Heartbeat handler detected dead connection for host: host-6627
HbrError stack:
[0] Exception Vmacore::InvalidStateException: No connection (host=host-6627)2017-04-27T20:57:45.928Z [7FDAC6F3C700 warning 'Default'] Failed to connect socket; <io_obj p:0x0000000005403810, h:573, <TCP '0.0.0.0:0'>, <TCP 'X.X.X.X:80'>>, e: system:111(Connection refused)

2024-08-27T05:00:45.486Z info hbrsrv[25112] [Originator@6876 sub=AgentConnection] Agent host-1058/hostd: restarting with address x.x.x.x
2024-08-27T05:29:42.316Z info hbrsrv[13508] [Originator@6876 sub=AgentConnection opID=hsl-801080ae] Agent host-1074/hostd: restarting with address x.x.x.x

Environment

VMware vSphere Replication 8.x
VMware vSphere Replication 9.x

Cause

  • During startup, vSphere Replication Server service, "hbrsrv" needs to try to connect to all hosts in the vCenter inventory. If customer has a large environment and a single hosts cannot connect to "hbrsrv", it will take long time to loop through them.
  • Network firewall may add unnecessary time to HBR service startup.
  • If an ESXi host is removed from the VC inventory while "hbrsrv" is offline that will also interfere with the startup of the service as "hbrsrv" still has an ESXi entry for that host in its database.
  • There may be other communication issues while communicating with ESXI hosts e.g. misconfiguration of ports in the dVSwitch of the target vCenter server, duplicate IPs, certificate related issues etc.

Resolution

To resolve this issue,

  1. Investigate network firewall as port 80 is required to be open between the vSphere Replication server and ESXi host (intra-site). For more information see  Port numbers that must be open for vSphere Replication.
  2. If vSphere Replication server is attempting to register and connect to a host which no longer exists in the vCenter inventory, an edit of the hbrsrv.db may be required.  Please check vSphere Replication cannot establish a TCP connection to server at 127.0.0.1:8123 - Connection refused for the steps to remove the ESXi hosts from the DB
  3. For vSphere Replication 8.7 and above, If ESXi hosts are unresponsive or there exists a network or a configuration issue, you can assign the com.vmware.vr.disallowed tag to the ESXi host or cluster to workaround the condition until the issue is resolved. For more information on how to assign a tag, please check Assign or Remove a Tag