ESXI hosts go unresponding on the vCenter Server with maximum connection exceeded for the envoy proxy

search cancel

ESXI hosts go unresponding on the vCenter Server with maximum connection exceeded for the envoy proxy

book

Article ID: 404721

calendar_today

Updated On:

Products

VMware Live Recovery VMware vSphere ESXi

Issue/Introduction

Symptoms:

ESXi hosts randomly go unresponding on the vCenter server
This happens after VMs are configured with enhanced replication
The CPU utilization of the VRMS servers may go up to 100% when the issue occurs

Environment

VMware vSphere ESXi
vSphere Replication 8.8
vSphere Replication 9.x

Cause

The issue is caused when the envoy proxy service reaches the maximum number of connections it can support:

In /var/log/envoy.log you will notice the below warnings:

2025-06-26T06:59:55.859Z In(166) envoy[2099870]: "2025-06-26T06:59:46.278Z warning envoy[2100520] [Originator@6876 sub=filter] [Tags: "ConnectionId":"2015172"] remote https connections exceed max allowed: 128"
2025-06-26T06:59:55.859Z In(166) envoy[2099870]: "2025-06-26T06:59:46.355Z warning envoy[2100520] [Originator@6876 sub=filter] [Tags: "ConnectionId":"2015179"] remote https connections exceed max allowed: 128"

The maximum connection limit is reached when vSphere Replication does not close HTTP connections which are created as part of the health checks during the configuration of enhanced replication:

Proto Recv Q Send Q Local Address Foreign Address State World ID CC Algo World Name
----- ------ ------ ------------------- ------------------- ----------- -------- ------- ----------
tcp 0 0 10.176.xx.xx:443 10.176.xx.xx:54108 ESTABLISHED 35101291 newreno envoy
tcp 0 0 10.176.xx.xx:443 10.176.xx.xx:54100 ESTABLISHED 35101291 newreno envoy
tcp 0 0 10.176.xx.xx:443 10.176.xx.xx:54084 ESTABLISHED 35101291 newreno envoy
tcp 0 0 10.176.xx.xx:443 10.176.xx.xx:54080 ESTABLISHED 35101290 newreno envoy

In /var/log/envoy-access.log you will notice the below connections open for hours:

2025-06-25T05:16:04.600Z In(166) envoy-access[2099882]: GET /hbragent/api/v1.0/appPing?broker_ip=10.191.xx.xx&broker_port=32032&group=PING-GID-5243c529-e210-xxxx 200 via_upstream - 0 387 - 107 106 0 10.176.xx.xx:34164 HTTP/1.1 TLSv1.2 10.176.xx.xx:443 - HTTP/1.1 - /var/run/vmware/hbragent-rest-tunnel - -

Resolution

Broadcom is aware of this issue and is working on a fix

Workaround:

Open a SSH session to the VRMS server on both the sites.
Open file /opt/vmware/hms/conf/hms-configuration.xml with a text editor
Set schedule-health-checks to false
Restart HMS service on both sites

systemctl restart hms

5. While configuring enhanced replication, skip the health check. Clicking the "Next" button will allow you to proceed with the replication configuration without performing the health check.

Feedback

thumb_up Yes

thumb_down No