An ESXi host managed by vCenter becomes unresponsive and shows disconnected in vSphere client. The virtual machines on this host are still running, but without vCenter connectivity, they cannot be migrated to other hosts for maintenance or troubleshooting purposes.
Below symptoms can also be seen:
1. Connecting to host client UI fails
2. Restarting vpxa and hostd doesn't help
4. vCenter server does receive heartbeats from host (Can be confirmed by doing a packet capture activity)
5. In vCenter server, var/log/vmware/vpxd/vpxd.log show below errors:
YYYY-MM-DDTHH:MM:SS:MS info vpxd[06657] [Originator@6876 sub=vmomi.soapStub[30380] opID=HB-host-##@647-3eef73bf] SOAP request returned HTTP failure; <<io_obj p:0x00007fb04c4a4b78, h:83, <UNIX ''>, <UNIX '/var/run/envoy-hgw/hgw-pipe'>>, /hgw/host-##/vpxa>, method: getChanges; code: 503(Service Unavailable); fault: (null)
YYYY-MM-DDTHH:MM:SS:MS info vpxd[06038] [Originator@6876 sub=vmomi.soapStub[30380] opID=HB-SpecSync-host-##@0-5a9820e0] SOAP request returned HTTP failure; <<io_obj p:0x00007fb044736688, h:89, <UNIX ''>, <UNIX '/var/run/envoy-hgw/hgw-pipe'>>, /hgw/host-##/vpxa>, method: setConfig; code: 503(Service Unavailable); fault: (null)
YYYY-MM-DDTHH:MM:SS:MS error vpxd[06038] [Originator@6876 sub=Vmomi opID=HB-SpecSync-host-##@0-5a9820e0] Got vmacore exception when invoking VMOMI method; <</hgw/host-##>, /vpxa>, vpxapi.VpxaService.setConfig, N7Vmacore4Http13HttpExceptionE(HTTP error response: Service Unavailable)
-->[context]zKq7AVECAQAAALnVYwEddnB4ZAAAGdJTbGlidm1hY29yZS5zbwAAUhlDAIxBRACaWEsBnnEdbGlidm1vbWkuc28AAYMiIAEuTyABnccfgo3SNwF2cHhkAII/ETgBgv4cOAGC/R04AYKowTcBglHsNwEBfTYaAc0qGgM2FAxsaWJ2cHhhcGktdHlwZXMuc28AgizLHgGCf80eAYLOzh4BgqeWZQKC+aZlAoIjvmQCgr6QZQIA5ts3APk0OACT0FEEro4AbGlicHRocmVhZC5zby4wAAUv3g9saWJjLnNvLjYA[/context]
YYYY-MM-DD HH:MM:SS:MS warning vpxd[06038] [Originator@6876 sub=InvtHostCnx opID=HB-SpecSync-host-##@0-5a9820e0] DoHostSpecSync failed for [vim.HostSystem:host-##,ESXi-IP]
YYYY-MM-DD HH:MM:SS:MS warning vpxd[06038] [Originator@6876 sub=InvtHostCnx opID=HB-SpecSync-host-##@0-5a9820e0] Spec sync failed to [vim.HostSystem:host-##,ESXi-IP]
vCenter Server 7.x and 8.x
ESXi 7.x and 8.x
1. In ESXi, /var/run/log/envoy.log shows https connections have exceeded max allowed limit:
YYYY-MM-DDTHH:MM:SS:MS In(166) envoy[41277035]: "YYYY-MM-DDTHH:MM:SS:MSZ warning envoy[41277051] [Originator@6876 sub=filter] [C292324] remote https connections exceed max allowed: 128"
YYYY-MM-DDTHH:MM:SS:MS In(166) envoy[41277035]: "YYYY-MM-DDTHH:MM:SS:MSZ warning envoy[41277051] [Originator@6876 sub=filter] [C292324] closing connection TCP<Remote entity-IP:55252, ESXi-IP:443>"
2. When you run this command localcli network ip connection list
in host, we see multiple entries of TIME_WAIT connections from Remote entity-IP:
localcli network ip connection list
Proto Recv Q Send Q Local Address Foreign Address State World ID CC Algo World Name
----- ------ ------ --------------------------------- ------------------- ----------- -------- ------- ----------
tcp 0 0 ESXi-IP:443 Remote entity-IP:62050 TIME_WAIT 0
tcp 0 0 ESXi-IP:443 Remote entity-IP:62049 TIME_WAIT 0
3. The root cause can be, Remote entity initiated connections to port 443 on the ESXi host, but these connections were not properly terminated, leading to subsequent connection failures.
1. Investigate about the remote machine IP which is seen in the logs. It's significane and whether it is blocking any communication between vCenter server and ESXi.
2. Find the opened connection list or ports in remote machine using the command netstat -an
and compare it with that in ESXi host.
3. Reboot the remote machine to release these connections.