ESXI hosts go unresponding on the vCenter Server with maximum connection exceeded for the envoy proxy
search cancel

ESXI hosts go unresponding on the vCenter Server with maximum connection exceeded for the envoy proxy

book

Article ID: 404721

calendar_today

Updated On:

Products

VMware Live Recovery VMware vSphere ESXi VMware vCenter Server

Issue/Introduction

  • ESXi hosts randomly go unresponding on the vCenter server
  • This happens after VMs are configured with enhanced replication
  • The CPU utilization of the VRMS servers may go up to 100% when the issue occurs
  • In /var/run/log/envoy.log you will notice the below warnings:

YYYY-MM-DD-HH-MM-SS In(166) envoy[2099870]: "YYYY-MM-DD-HH-MM-SS warning envoy[2100520] [Originator@6876 sub=filter] [Tags: "ConnectionId":"2015172"] remote https connections exceed max allowed: 128"

  • The maximum connection limit is reached when vSphere Replication does not close HTTP connections which are created as part of the health checks during the configuration of enhanced replication:

Proto  Recv Q  Send Q  Local Address        Foreign Address      State        World ID  CC Algo  World Name
-----  ------  ------  -------------------  -------------------  -----------  --------  -------  ----------
tcp         0       0  xx.xx.xx.xx:443    xx.xx.xx.xx:54108   ESTABLISHED  35101291  newreno  envoy
tcp         0       0  xx.xx.xx.xx:443    xx.xx.xx.xx:54100   ESTABLISHED  35101291  newreno  envoy

  • In /var/run/log/envoy-access.log you will notice the below connections open for hours:

YYYY-MM-DD-HH-MM-SS In(166) envoy-access[2099882]: GET /hbragent/api/v1.0/appPing?broker_ip=xx.xx.xx.xx&broker_port=32032&group=PING-GID-5243c529-e210-xxxx 200 via_upstream - 0 387 - 107 106 0 xx.xx.xx.xx:34164 HTTP/1.1 TLSv1.2 xx.xx.xx.xx:443 - HTTP/1.1 - /var/run/vmware/hbragent-rest-tunnel - -

Environment

  • ESXi 8.x
  • vSphere Replication 8.8
  • vSphere Replication 9.x
  • vSphere Live Site Recovery 9.x
  • vCenter 8.x
  • vCenter 9.x

Cause

The issue is caused when the envoy proxy service reaches the maximum number of connections it can support

Resolution

This issue is resolved in vSphere Replication 9.0.2.3. To download vSphere Replication 9.0.2.3 go to Broadcom Support Portal

Workaround:

  1. Open a SSH session to the VRMS server on both the sites.
  2. Open file /opt/vmware/hms/conf/hms-configuration.xml with a text editor
  3. Set schedule-health-checks to false
  4. Restart HMS service on both sites

systemctl restart hms

     5. While configuring enhanced replication, skip the health check. Clicking the "Next" button will allow you to proceed with the replication configuration without performing the health check.

Additional Information

vSphere Replication 9.0.2.3 Release Notes