ESXi Hosts Show as "Not Responding" Due to Envoy Session Limits Exceeded by Replication Services
search cancel

ESXi Hosts Show as "Not Responding" Due to Envoy Session Limits Exceeded by Replication Services

book

Article ID: 383231

calendar_today

Updated On:

Products

VMware vCenter Server

Issue/Introduction

Symptom:

  • In vCenter Server environments utilizing replication, ESXi hosts intermittently enter a "Not Responding" state, resulting in a loss of VM console connectivity. Simultaneously, the vCenter Server web interface may become unresponsive or return HTTP 401 Unauthorized errors, even though the Virtual Machines remain operational.

  • ESXi hosts exhibit a "flapping" behavior within the vCenter Server inventory, continuously cycling between "Not Responding" and "Responding" states. While the hosts remain accessible via SSH, restarting the management agents provides only temporary mitigation before the issue recurs.

  • The ESXi host management interface becomes unreachable, though the host remains responsive to ping and virtual machine workloads continue without interruption. This condition is observed to affect multiple hosts in sequence. Rebooting the affected host resolves the issue temporarily.

Environment

  • VMware ESXi 7.x 
  • VMware ESXi 8.x
  • VMware vCenter Server 7.x 
  • VMware vCenter Server 8.x
  • SDDC Manager 5.x and newer
  • Environment using replication services (such as Veeam Replication, vSphere Replication or Nutanix CVM) 

Cause

  • Scenario 1:
    • Replication services can create more HTTPS sessions than it closes, and more than the ESXi host's envoy service can handle. The envoy service has a limit of 128 concurrent HTTPS sessions. When this limit is exceeded, connection failures occur between vCenter and the host.

      In /var/run/log/envoy.log on ESXi host, we see below entries -
      2025-12-02T15:04:42.521Z In(166) envoy[2106882]: "2025-12-02T15:04:38.531Z warning envoy[2107532] [Originator@6876 sub=filter] [Tags: "ConnectionId":"2547818"] 
      remote https connections exceed max allowed: 128"

  • Scenario 2:
    • Multiple operations running through the SDDC Manager due to password updates/rotation, as well as the LCM, which uses a shared method to connect to ESXi via vCenter. This leads to the ESXi being bloated with too many connection requests.

Resolution

  • Scenario 1:
    • Immediate Work-around:
      1. Identify the replication service creating excessive connections by checking envoy.log for the source IP.
      2. Temporarily disable the identified replication service.
      3. Restart the envoy service on the affected host:
           /etc/init.d/envoy restart
    • Long-term Solution:
      1. Update replication software to latest version.
      2. Configure replication jobs to limit concurrent sessions.
      3. If issues persist:
        1. Implement firewall rules to limit concurrent connections from replication servers.
        2. Contact replication software vendor for additional guidance.
  • Scenario 2:
    • Resolution:
      1. Engineering is aware of the issue and a fix is planned for the future. Please subscribe to this KB, to be kept updated.

    • Workaround: 
      1. Restart the envoy service on the affected host:
           /etc/init.d/envoy restart

Additional Information