ESXi hosts enter a "Not Responding" state during Software-Defined Data Center (SDDC) Manager operations
search cancel

ESXi hosts enter a "Not Responding" state during Software-Defined Data Center (SDDC) Manager operations

book

Article ID: 441161

calendar_today

Updated On:

Products

VMware SDDC Manager / VCF Installer VMware vSphere ESXi

Issue/Introduction

  • ESXi hosts enter a "Not Responding" state during Software-Defined Data Center (SDDC) Manager operations, such as password rotation or Lifecycle Management (LCM) updates. Host management agents become unreachable, although management agents respond to ping and virtual machines remain operational.
  • ESXi hosts exhibit a "flapping" behavior within the vCenter Server inventory, continuously cycling between "Not Responding" and "Responding" states. While the hosts remain accessible via SSH, restarting the management agents provides only temporary mitigation before the issue recurs.
  • In the /var/run/log/envoy.log file on the ESXi host, you see the following entries:
    YYYY-MM-DDTHH:MM:SS.Z In(166) envoy[2106882]: "YYYY-MM-DDTHH:MM:SS.Z warning envoy[2107532] [Originator@6876 sub=filter] [Tags: "ConnectionId":"######"] remote https connections exceed max allowed: 128"

Environment

  • VMware Cloud Foundation (VCF) 5.x
  • VMware ESXi 7.x
  • VMware ESXi 8.x

Cause

Multiple simultaneous operations in SDDC Manager create excessive connection requests to vCenter Server and ESXi hosts. This results in the host envoy service reaching its maximum session limit.

Resolution

Step 1: Identify the Source of Excessive Connections

Before remediating, verify if the session limit has been reached and identify the source IP address.

  1. Enable SSH on the affected ESXi host.
  2. Log in to the host as root via SSH.
  3. Run the following command to count the number of sessions per source IP address hitting the management port:
    grep ":443" /var/run/log/envoy-access.log | cut -d' ' -f 15 | sort | uniq | cut -d ':' -f 1 | uniq -c

Step 2: Immediate Remediation (Clear Active Sessions)

Restarting the envoy service will clear the connection queue and allow vCenter Server to reconnect to the host.

  1. From the same ESXi SSH session, execute the following command:
    /etc/init.d/envoy restart
  2. Verify in the vSphere Client that the host returns to a Connected or Responding state.

Additional Information

For additional context on envoy session limits, see ESXi Hosts Show as "Not Responding" Due to Envoy Session Limits Exceeded by Replication Services