ESXI hosts go unresponding on the vCenter Server with maximum connection exceeded for the envoy proxy
search cancel

ESXI hosts go unresponding on the vCenter Server with maximum connection exceeded for the envoy proxy

book

Article ID: 404721

calendar_today

Updated On:

Products

VMware Live Recovery VMware vSphere ESXi

Issue/Introduction

Symptoms:

  • ESXi hosts randomly go unresponding on the vCenter server
  • This happens after VMs are configured with enhanced replication
  • The CPU utilization of the VRMS servers may go up to 100% when the issue occurs

Environment

  • VMware vSphere ESXi
  • vSphere Replication 8.8
  • vSphere Replication 9.x

Cause

  • The issue is caused when the envoy proxy service reaches the maximum number of connections it can support:

In /var/log/envoy.log you will notice the below warnings:

2025-06-26T06:59:55.859Z In(166) envoy[2099870]: "2025-06-26T06:59:46.278Z warning envoy[2100520] [Originator@6876 sub=filter] [Tags: "ConnectionId":"2015172"] remote https connections exceed max allowed: 128"
2025-06-26T06:59:55.859Z In(166) envoy[2099870]: "2025-06-26T06:59:46.355Z warning envoy[2100520] [Originator@6876 sub=filter] [Tags: "ConnectionId":"2015179"] remote https connections exceed max allowed: 128"

  • The maximum connection limit is reached when vSphere Replication does not close HTTP connections which are created as part of the health checks during the configuration of enhanced replication:

Proto  Recv Q  Send Q  Local Address        Foreign Address      State        World ID  CC Algo  World Name
-----  ------  ------  -------------------  -------------------  -----------  --------  -------  ----------
tcp         0       0  10.176.xx.xx:443    10.176.xx.xx:54108   ESTABLISHED  35101291  newreno  envoy
tcp         0       0  10.176.xx.xx:443    10.176.xx.xx:54100   ESTABLISHED  35101291  newreno  envoy
tcp         0       0  10.176.xx.xx:443    10.176.xx.xx:54084   ESTABLISHED  35101291  newreno  envoy
tcp         0       0  10.176.xx.xx:443    10.176.xx.xx:54080   ESTABLISHED  35101290  newreno  envoy

 

  • In /var/log/envoy-access.log you will notice the below connections open for hours:

2025-06-25T05:16:04.600Z In(166) envoy-access[2099882]: GET /hbragent/api/v1.0/appPing?broker_ip=10.191.xx.xx&broker_port=32032&group=PING-GID-5243c529-e210-xxxx 200 via_upstream - 0 387 - 107 106 0 10.176.xx.xx:34164 HTTP/1.1 TLSv1.2 10.176.xx.xx:443 - HTTP/1.1 - /var/run/vmware/hbragent-rest-tunnel - -

Resolution

Broadcom is aware of this issue and is working on a fix

Workaround:

  1. Open a SSH session to the VRMS server on both the sites.
  2. Open file /opt/vmware/hms/conf/hms-configuration.xml with a text editor
  3. Set schedule-health-checks to false
  4. Restart HMS service on both sites

systemctl restart hms

     5. While configuring enhanced replication, skip the health check. Clicking the "Next" button will allow you to proceed with the replication configuration without performing the health check.