ESXi Hosts Show as "Not Responding" Due to Envoy Session Limits Exceeded by Replication Services
book
Article ID: 383231
calendar_today
Updated On:
Products
VMware vCenter Server
Issue/Introduction
Scenario 1:
For vCenter with replication, ESXi hosts intermittently appear as "Not Responding" in vCenter Server while VMs remain operational. The vCenter web interface may become unresponsive or display HTTP 401 errors. Users accessing VMs through vCenter lose connectivity during these periods.
Scenario 2:
ESXi hosts intermittently and continuously go from "Not Responding" to "Responding" state on the vCenter Server. Hosts are reachable via SSH and after restarting management services of the affected hosts, it stops for a short while and then the issue resumes.
Impact/Symptoms:
ESXi hosts show as "Not Responding" in vCenter Server.
Host management interface becomes unreachable, however, ping works.
VMs continue running without interruption
Issue may affect multiple hosts in sequence
Condition resolves temporarily after host reboot
Environment
VMware ESXi 7.x and newer
VMware vCenter Server 7.x and newer
VMware ESXi 8.x
VMware vCenter Server 8.x
SDDC Manager 5.x and newer
Environment using replication services (such as Veeam Replication, vSphere Replication or Nutanix CVM)
Cause
Scenario 1:
Replication services can create more HTTPS sessions than it closes, and more than the ESXi host's envoy service can handle. The envoy service has a limit of 128 concurrent HTTPS sessions. When this limit is exceeded, connection failures occur between vCenter and the host. Evidence in host envoy logs: warning envoy[#######] [Originator@#### sub=filter] [Tags: "ConnectionId":"########"] remote https connections exceed max allowed: 128
Scenario 2:
Multiple operations running through the SDDC Manager due to password updates/rotation, as well as the LCM, which uses a shared method to connect to ESXi via vCenter. This leads to the ESXi being bloated with too many connection requests.
Resolution
Scenario 1:
Immediate Work-around:
Identify the replication service creating excessive connections by checking envoy.log for the source IP
Temporarily disable the identified replication service
Restart the envoy service on the affected host:
/etc/init.d/envoy restart
Long-term Solution:
Update replication software to latest version
Configure replication jobs to limit concurrent sessions
If issues persist:
Implement firewall rules to limit concurrent connections from replication servers
Contact replication software vendor for additional guidance
Scenario 2:
Resolution:
Engineering are aware of the issue and is fix is planned for the future. Please subscribe to this KB to be kept updated.
Workaround:
Restart the envoy service on the affected host:
/etc/init.d/envoy restart
Additional Information
This issue can occur with any add-on solution that creates multiple concurrent HTTPS connections to ESXi hosts
The envoy service manages HTTPS connections between vCenter Server and ESXi hosts
VM operations remain unaffected as they don't rely on these management connections