vSphere Replication HBR service takes a long time to start
search cancel

vSphere Replication HBR service takes a long time to start

book

Article ID: 312582

calendar_today

Updated On:

Products

VMware Live Recovery VMware vSphere ESXi

Issue/Introduction

Symptoms : 

  • vSphere Replication HBR service takes a long time to start.

  • ​In /var/log/vmware/hbrsrv.log you see entries similar to:

Heartbeat handler detected dead connection for host: host-6627
HbrError stack:
[0] Exception Vmacore::InvalidStateException: No connection (host=host-6627)2017-04-27T20:57:45.928Z [7FDAC6F3C700 warning 'Default'] Failed to connect socket; <io_obj p:0x0000000005403810, h:573, <TCP '0.0.0.0:0'>, <TCP 'X.X.X.X:80'>>, e: system:111(Connection refused)

2024-08-27T05:00:45.486Z info hbrsrv[25112] [Originator@6876 sub=AgentConnection] Agent host-1058/hostd: restarting with address x.x.x.x
2024-08-27T05:29:42.316Z info hbrsrv[13508] [Originator@6876 sub=AgentConnection opID=hsl-801080ae] Agent host-1074/hostd: restarting with address x.x.x.x

  • When trying to configure replications it failed with an error

"This operation is not allowed in the current state"

Example : 

Source hms.log

2026-02-17 07:00:36.415 ERROR vmware.hms.job [hms-main-thread-1393] (..job.impl.PrimaryVc2VcConfigureReplicationWorkflow) {operationID=6d04b060-2b39-4207-b678-############-HMS-106432} [task=HTID-41dde993-2b57-49f1-9828-460525f038d0] | Failed remote replication configuration com.vmware.vim.binding.vim.fault.InvalidState: The operation is not allowed in the current state.

Destination hms.log

2026-02-17 07:00:35.831 ERROR vmware.hms.TaskRunnable [hms-main-thread-3556] (..jvsl.util.Slf4jUtil) {operationID=6d04b060-2b39-4207-b678-############-HMS-106432} [task=HTID-d8b9f05f-ff40-46ac-ac66-f47667ec18b0] | runTask-failed name: "Configure Replication Secondary"; class: com.vmware.hms.job.impl.SecondaryVc2VcConfigureReplicationWorkflow; groupMoId: GID-8b32f7df-a825-49ab-b407-962f64bbea76; hbrTag: null; err: Failed to retrieve HbrBrokerServer ########-####-####-####-############; time: 10162 ms
java.lang.IllegalStateException: Failed to retrieve HbrBrokerServer ########-####-####-####-############

  • Running  Enhanced replication mappings test fails with below error

"The vSphere replication management server cannot configure replication on target vsphere replication server (id:'host-5602',name :##### and target broker "NA"

  • Restarting the hbrsrv service on vSphere Replication service indicates the hbrsrv service is in a hung state with continuous hbrsrv "restarting with address" events in hbrsrv.log in vSphere Replication appliance  

    Example : 

hbrsrv.log 

2026-02-18T07:19:21.888Z info hbrsrv[350479] [Originator@6876 sub=AgentConnection opID=hs-init-1b88f18d] Agent host-5##3/hostd: restarting with address 10.#.#.3
2026-02-18T07:21:21.494Z info hbrsrv[350490] [Originator@6876 sub=AgentConnection opID=hs-init-1b88f18d] Agent host-7##7/hostd: restarting with address 10.#.#.2

Refer : Start, Stop, and Restart vSphere Replication Appliance Services

Environment

VMware vSphere Replication 8.x

VMware vSphere Replication 9.x

Cause

  • During startup, vSphere Replication Server service, "hbrsrv" needs to try to connect to all hosts in the vCenter inventory. If customer has a large environment and a single hosts cannot connect to "hbrsrv", it will take long time to loop through them.

  • Network firewall may add unnecessary time to HBR service startup.

  • If an ESXi host is removed from the VC inventory while "hbrsrv" is offline that will also interfere with the startup of the service as "hbrsrv" still has an ESXi entry for that host in its database.

  • There may be other communication issues while communicating with ESXI hosts e.g. misconfiguration of ports in the dVSwitch of the target vCenter server, duplicate IPs, certificate related issues etc.
  • hbrsrv service may also have issues connecting to the ESXi hosts and might  unable to login to the source ESXi host due to missing certificate Info.

    hbrsrv.log :

    2026-02-18T07:59:55.921Z info hbrsrv[433810] [Originator@6876 sub=vmomi.soapStub[23671] opID=hs-init-1b88f18d] SOAP request returned HTTP failure; <SSL(<io_obj t:N7Vmacore6System19TCPSocketObjectAsioE, h:16, <TCP '10.#.#.1 : 48220'>, <TCP '10.#.#.2 : 443'>>), /sdk>, method: loginBySSLThumbprint; code: 500(Internal Server Error); fault: (vim.fault.NoClientCertificate) {
    -->    faultCause = (vmodl.MethodFault) null,
    -->    faultMessage = <unset>
    -->    msg = "Received SOAP response fault from [<SSL(<io_obj t:N7Vmacore6System19TCPSocketObjectAsioE, h:16, <TCP '10.#.#.1 : 48220'>, <TCP '10.#.#.2 : 443'>>), /sdk>]: loginBySSLThumbprint
    --> Client connected without supplying a certificate."
    --> }
    2026-02-18T07:59:55.921Z info hbrsrv[433810] [Originator@6876 sub=AgentConnection opID=hs-init-1b88f18d] Agent host-7##7/hostd: failed to log in. Connection type: /sdk
    2026-02-18T07:59:55.921Z error hbrsrv[433810] [Originator@6876 sub=AgentConnection opID=hs-init-1b88f18d] Connection failed to agent host-7##7/hostd (10.#.#.2): Can't login to the host

Resolution

  1. Investigate network firewall as port 80 is required to be open between the vSphere Replication server and ESXi host (intra-site). For more information see  Port numbers that must be open for vSphere Replication.

  2. If vSphere Replication server is attempting to register and connect to a host which no longer exists in the vCenter inventory, an edit of the hbrsrv.db may be required.  Please check vSphere Replication cannot establish a TCP connection to server at 127.0.0.1:8123 - Connection refused for the steps to remove the ESXi hosts from the DB

  3. For vSphere Replication 8.7 and above, If ESXi hosts are unresponsive or there exists a network or a configuration issue, you can assign the com.vmware.vr.disallowed tag to the ESXi host or cluster to workaround the condition until the issue is resolved.