ESXi host(s) in the target site goes to a unresponsive state intermittently after enabling enhanced replication for VMs
search cancel

ESXi host(s) in the target site goes to a unresponsive state intermittently after enabling enhanced replication for VMs

book

Article ID: 407829

calendar_today

Updated On:

Products

VMware Live Recovery

Issue/Introduction

Symptoms:

  • After configuring enhanced replication, ESXi host in the destination site goes offline or unresponsive on the vCenter server intermittently
  • No mapping tests were triggered during configuring enhanced replication
  • Scheduled health checks were disabled in the HMS configuration file

Environment

  • VMware Live Recovery 9.x
  • VMware vSphere Replication 9.x
  • VMware vSphere Replication 8.x

Cause

  • The issue is caused due to leaked pings from failed earlier failed registration attempts on the ESXi hosts
  • The leaked pings exhaust all connection threads available (128) for the Envoy Proxy service on the effected ESXi host
  • HMS keeps retrying to register hbrsrvuw on ESXi host (s):
2025-07-25 20:41:57.778 INFO  com.vmware.hms.hbrsrvuw.HbrsrvuwRegistrarService [hms-main-scheduled-thread-19] (..hms.hbrsrvuw.HbrsrvuwRegistrarService$HostTask) [operationID=61c4ba58-895e-48c6-a026-a39e6edb9773-HMSINT-37786] | Retrying registration of hbrsrvuw from host-1234 (attempt 4899)
2025-07-25 20:43:00.084 INFO  com.vmware.hms.hbrsrvuw.HbrsrvuwRegistrarServqice [hms-main-scheduled-thread-13] (..hms.hbrsrvuw.HbrsrvuwRegistrarService$HostTask) [operationID=61c4ba58-895e-48c6-a026-a39e6edb9773-HMSINT-37786] | Retrying registration of hbrsrvuw from host-1234 (attempt 4900)
 
  • These attempts however eventually fail:
2025-07-25 20:42:00.083 ERROR com.vmware.hms.remote.SiteManager [hms-main-thread-18640] (..hms.remote.SiteManagerImpl) [operationID=61c4ba58-895e-48c6-a026-a39e6edb9773-HMSINT-37786, task=HTID-8d499fba-debb-4e35-99ac-81e0ae75cfd2] | Unable to register VR Server vm '10.xx.xx.xx' uri 'https://10.xx.xx.xx:443/hbr' thumbprint '##.##.##.##.##.##.##' 
com.vmware.vim.vmomi.client.exception.SslException: javax.net.ssl.SSLException: SSL handshake from 0.0.0.0/0.0.0.0:60382 to /10.xx.xx.xx:443 failed in 1 ms
        at com.vmware.vim.vmomi.client.common.impl.ResponseImpl.setError(ResponseImpl.java:265) ~[vlsi-client-9.0.2.jar:?]
        at com.vmware.vim.vmomi.client.http.impl.HttpExchangeBase.setResponseError(HttpExchangeBase.java:362) ~[vlsi-client-9.0.2.jar:?]
        at com.vmware.vim.vmomi.client.http.impl.HttpExchange.invokeWithinScope(HttpExchange.java:59) ~[vlsi-client-9.0.2.jar:?]
        at com.vmware.vim.vmomi.core.tracing.NoopTracer$NoopSpan.runWithinSpanContext(NoopTracer.java:120) ~[vlsi-core-9.0.2.jar:?]
        at com.vmware.vim.vmomi.client.http.impl.TracingScopedRunnable.run(TracingScopedRunnable.java:17) ~[vlsi-client-9.0.2.jar:?]
        at com.vmware.vim.vmomi.client.http.impl.HttpExchangeBase.run(HttpExchangeBase.java:52) ~[vlsi-client-9.0.2.jar:?]
        at com.vmware.vim.vmomi.client.http.impl.HttpProtocolBindingBase.executeRunnable(HttpProtocolBindingBase.java:229) ~[vlsi-client-9.0.2.jar:?]
 
  • HMS still continues to ping with operation ID 61c4ba58-895e-48c6-a026-a39e6edb9773-HMSINT-37786 continue to ping the host(s).
2025-07-25 20:59:59.240 TRACE com.vmware.hms.net.hbr.ping.svr.39313738-3034-4753-4831-333254444643 [hms-ping-scheduled-thread-8] (..net.impl.VmomiPingConnectionHandler) [operationID=61c4ba58-895e-48c6-a026-a39e6edb9773-HMSINT-37786, operationID=aa2d0367-1569-4576-901e-d7ba68a530bf-HMS-PING] | Session: N/A on server '10.xx.xx.xx:443/hbr' pinged successf
ully
2025-07-25 20:59:59.311 TRACE com.vmware.hms.net.hbr.ping.svr.39313738-3034-4753-4831-333254444643 [hms-ping-scheduled-thread-7] (..net.impl.VmomiPingConnectionHandler) [operationID=61c4ba58-895e-48c6-a026-a39e6edb9773-HMSINT-37786, operationID=8901d13c-2063-4f59-9fac-ce0365459d8a-HMS-PING] | Session: N/A on server '10.xx.xx.xx:443/hbr' pinged successfully
 
  • Meanwhile /var/log/envoy.log will report the below intermittently:
2025-07-26T08:51:41.082Z In(166) envoy[26452313]: "2025-07-26T08:51:40.723Z warning envoy[26452328] [Originator@6876 sub=filter] [Tags: "ConnectionId":"2961328"] remote https connections exceed max allowed: 128"
2025-07-26T08:51:41.082Z In(166) envoy[26452313]: "2025-07-26T08:51:40.724Z warning envoy[26452328] [Originator@6876 sub=filter] [Tags: "ConnectionId":"2961329"] remote https connections exceed max allowed: 128"

Resolution

 

Broadcom is aware of the issue and is working on a fix

 

Workaround:

  • As a workaround, please restart HMS service on the target site:

systemctl restart hms

  • Restarting HMS will remove all memory references for the operations the service has run or is running
  • This will therefore stop the continuous pings to the effected ESXi hosts

 

Please Note: If the issue occurs again, please collect the vSphere Replication logs of both sites and the effected ESXI host logs as soon immediately before contacting Broadcom Support