ESXi hosts intermittently disconnect from vCenter Server and new HTTPS connections to the ESXi host fail.
search cancel

ESXi hosts intermittently disconnect from vCenter Server and new HTTPS connections to the ESXi host fail.

book

Article ID: 394541

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

  • On the vCenter Server, the vpxd logs (/var/log/vmware/vpxd/) shows the host as NO_RESPONSE and there are errors related to TCP/443 connections to the host failing.
    YYYY-MM-DDTHH:MM:SS error vpxd[1144976] [Originator@6876 sub=Vmomi opID=HB-host-###@1964483-12065fe1-WorkQueue-5bf1a3b0] Got vmacore exception when invoking VMOMI method; <</hgw/host-###>, /vpxa>, vpxapi.VpxaService.ConfigureDatastoreIORMOnHost, N7Vmacore4Http13HttpExceptionE(HTTP error response: Service Unavailable) --> [context]zKq7AVECAQAAAA8jcwE####################################28uMAAF3/oPbGliYy5zby42AA==[/context]
    YYYY-MM-DDTHH:MM:SS info vpxd[1145293] [Originator@6876 sub=vmomi.soapStub[91] opID=HB-host-###@1964483-12065fe1-01] SOAP request returned HTTP failure; <<io_obj p:0x00007f39c49d7dd8, h:276, <UNIX ''>, <UNIX '/var/run/envoy-hgw/hgw-pipe'>>, /hgw/host-###/vpxa>, method: cancelWaitForUpdates; code: 503(Service Unavailable); fault: (null)
    YYYY-MM-DDTHH:MM:SS error vpxd[1145410] [Originator@6876 sub=MoCluster opID=PodCrxMgr-domain-c##-93716] [CreateApiProvider::errorCb] Providers stack failed: Error:
    -->    system_error
    --> Messages:
    -->    vapi.send.failed<Send of frame failed: N7Vmacore15SystemExceptionE(stream truncated: The connection was closed by the remote end during handshake.)
    --> [context]zKq7AVECAQAAAA8jcwE####################################28uMAAC3/oPbGliYy5zby42AA==[/context]>
    YYYY-MM-DDTHH:MM:SSwarning vpxd[1145346] [Originator@6876 sub=MoHost opID=HostSync-host-###-1790c127] host [vim.HostSystem:host-###,<esx-hostname>] connection state changed to NO_RESPONSE
  • On the ESXi hosts /var/run/log/envoy.log, these log print indicate the remote HTTPS connections have exceeded the 128 maximum that is allowed.

    YYYY-MM-DDTHH:MM:SS In(166) envoy[2099054]: "2025-04-14T04:26:54.826Z warning envoy[2099419] [Originator@6876 sub=filter] [Tags: "ConnectionId":"470807"] remote https connections exceed max allowed: 128"
    YYYY-MM-DDTHH:MM:SS In(166) envoy[2099054]: "2025-04-14T04:26:54.827Z warning envoy[2099419] [Originator@6876 sub=filter] [Tags: "ConnectionId":"470808"] remote https connections exceed max allowed: 128"
  • May also see log prints showing connections to the VMotionServer with invalid message type. These log prints indicate a device is connecting to the ESXi vmkernel interface that is tagged for vMotion, but is not sending the expected messages for initiating a vMotion.
    • The presence of these log prints is not a requirement to match this issue.
    • A high frequency of these log prints in a short timespan are indications of a possible vulnerability scanner scanning the hosts' vMotion interface, which could also be contributing to the excessive HTTPS connections to the hosts' Management IP. 
  • YYYY-MM-DDTHH:MM:SS Wa(180) vmkwarning: cpu21:2098050)WARNING: VMotionServer: 367: Invalid message type for new connection: 33620758.  Expecting message of type INIT (0 or 31).
    YYYY-MM-DDTHH:MM:SS In(182) vmkernel: cpu21:2098050)VMotionServer: 385: Error reading from pending connection: Failure
    YYYY-MM-DDTHH:MM:SS In(182) vmkernel: cpu10:2098050)MigrateNet: vm 2098050: 3282: Accepted connection from <Target-IP>:55098
    YYYY-MM-DDTHH:MM:SS In(182) vmkernel: cpu10:2098050)MigrateNet: vm 2098050: 3370: dataSocket 0x43258d22b680 receive buffer size is 563560
    YYYY-MM-DDTHH:MM:SS In(182) vmkernel: cpu10:2098050)VMotionServer: 307: Remote machine is ESX 6.0 or newer. VMotion version 0x200010c.
    YYYY-MM-DDTHH:MM:SS Wa(180) vmkwarning: cpu10:2098050)WARNING: VMotionServer: 367: Invalid message type for new connection: 33620758.  Expecting message of type INIT (0 or 31).
    YYYY-MM-DDTHH:MM:SS In(182) vmkernel: cpu10:2098050)VMotionServer: 385: Error reading from pending connection: Failure
    YYYY-MM-DDTHH:MM:SS In(182) vmkernel: cpu10:2098050)MigrateNet: vm 2098050: 3282: Accepted connection from <Target-IP>:55116
    YYYY-MM-DDTHH:MM:SS In(182) vmkernel: cpu10:2098050)MigrateNet: vm 2098050: 3370: dataSocket 0x43258d22b680 receive buffer size is 563560
    YYYY-MM-DDTHH:MM:SS In(182) vmkernel: cpu10:2098050)VMotionServer: 307: Remote machine is ESX 6.0 or newer. VMotion version 0x1ba.

 

Environment

VMware vSphere ESXi

Cause

  • This ESXi external HTTPS connections limit can be hit by any external device, or combination of devices, connecting to the ESXi host management IP over port TCP/443 excessively.
  • The most common sources that can reach this limit are replication services (see Additional Information below), backup solutions, and vulnerability scanners. 

Resolution

To workaround ,follow the below steps.

  • Identify the source of the HTTPS connections to the ESXi host Management IP. 
    • The following command can help to identify the source IP for sessions connecting to the hosts' Management IP.
      grep "<host-management-IP>:443" /var/run/log/envoy-access.log | cut -d' ' -f 15 | sort | uniq | cut -d ':' -f 1 | uniq -c
  • Depending on what the source of the connections are, if appropriate, stop or limit these sessions from occurring. 
  • Restart the envoy service on the affected host to clear the sessions.

     /etc/init.d/envoy restart

Long-term Solution

  • Contact the vendor for the source of the excessive HTTPS sessions for ways to limit the number of HTTPS sessions, ensuring the sessions are closed when finished, or otherwise prevent these sessions from occurring. 

Additional Information

For additional information when replication is found to be causing this ESXi limit to be exceeded, see ESXi Hosts Show as "Not Responding" Due to Envoy Session Limits Exceeded by Replication Services.