One ESXi is not responding in VC
search cancel

One ESXi is not responding in VC

book

Article ID: 367926

calendar_today

Updated On:

Products

VMware vCenter Server VMware vSphere ESXi

Issue/Introduction

The ESXi host was good in VC, but since some day it was suddenly not responding in VC. VC could not connect to the host any more and remained disconnected. Virtual machines in the host were still running but without VC management we could not migrate them to other hosts for a host reboot.

Below troubleshooting steps have been done,

1. Connecting to host UI was failed.

2. vpxa and hostd were running in the host

3. Restarting vpxa and hostd didn't help

4. VC did receive HB from host, confirmed by packet capturing.

5. In VC vpxd log we see such error,

2024-05-15T16:50:02.696+08:00 info vpxd[06657] [Originator@6876 sub=vmomi.soapStub[30380] opID=HB-host-12@647-3eef73bf] SOAP request returned HTTP failure; <<io_obj p:0x00007fb04c4a4b78, h:83, <UNIX ''>, <UNIX '/var/run/envoy-hgw/hgw-pipe'>>, /hgw/host-12/vpxa>, method: getChanges; code: 503(Service Unavailable); fault: (null)
2024-05-15T16:50:02.696+08:00 info vpxd[06038] [Originator@6876 sub=vmomi.soapStub[30380] opID=HB-SpecSync-host-12@0-5a9820e0] SOAP request returned HTTP failure; <<io_obj p:0x00007fb044736688, h:89, <UNIX ''>, <UNIX '/var/run/envoy-hgw/hgw-pipe'>>, /hgw/host-12/vpxa>, method: setConfig; code: 503(Service Unavailable); fault: (null)
2024-05-15T16:50:02.697+08:00 error vpxd[06038] [Originator@6876 sub=Vmomi opID=HB-SpecSync-host-12@0-5a9820e0] Got vmacore exception when invoking VMOMI method; <</hgw/host-12>, /vpxa>, vpxapi.VpxaService.setConfig, N7Vmacore4Http13HttpExceptionE(HTTP error response: Service Unavailable)
--> [context]zKq7AVECAQAAALnVYwEddnB4ZAAAGdJTbGlidm1hY29yZS5zbwAAUhlDAIxBRACaWEsBnnEdbGlidm1vbWkuc28AAYMiIAEuTyABnccfgo3SNwF2cHhkAII/ETgBgv4cOAGC/R04AYKowTcBglHsNwEBfTYaAc0qGgM2FAxsaWJ2cHhhcGktdHlwZXMuc28AgizLHgGCf80eAYLOzh4BgqeWZQKC+aZlAoIjvmQCgr6QZQIA5ts3APk0OACT0FEEro4AbGlicHRocmVhZC5zby4wAAUv3g9saWJjLnNvLjYA[/context]
2024-05-15T16:50:02.699+08:00 warning vpxd[06038] [Originator@6876 sub=InvtHostCnx opID=HB-SpecSync-host-12@0-5a9820e0] DoHostSpecSync failed for [vim.HostSystem:host-12,192.168.3.92]
2024-05-15T16:50:02.699+08:00 warning vpxd[06038] [Originator@6876 sub=InvtHostCnx opID=HB-SpecSync-host-12@0-5a9820e0] Spec sync failed to [vim.HostSystem:host-12,192.168.3.92]

Cause

Checked envoy log at /var/run/log/envoy.log in the ESXi host, we see https connections have exceeded max allowed,

2024-05-16T09:44:41.355Z In(166) envoy[41277035]: "2024-05-16T09:44:41.322Z warning envoy[41277051] [Originator@6876 sub=filter] [C292324] remote https connections exceed max allowed: 128"
2024-05-16T09:44:41.355Z In(166) envoy[41277035]: "2024-05-16T09:44:41.322Z warning envoy[41277051] [Originator@6876 sub=filter] [C292324] closing connection TCP<192.168.2.61:55252, 192.168.3.92:443>"

From commands/localcli_network_ip_connection_list.txt in vm support (or run "localcli network ip connection list" in host) we see hundreds of TIME_WAIT connections from 192.168.2.61,

Proto  Recv Q  Send Q  Local Address                      Foreign Address      State        World ID  CC Algo  World Name
-----  ------  ------  ---------------------------------  -------------------  -----------  --------  -------  ----------
tcp         0       0  192.168.3.92:443                   192.168.2.61:62050   TIME_WAIT           0
tcp         0       0  192.168.3.92:443                   192.168.2.61:62049   TIME_WAIT           0

The problem was that 192.168.2.61 opened connections to port 443 of the host but these connections were not closed properly, which has caused subsequent connections to fail.

 

Resolution

1. Check what the remote machine (192.168.2.61 in this case) is used for

2. Find a opened connection list in remote machine (netstat -an) and compare it with that in ESXi host

3. Reboot the remote machine to release these connections.