ESXi Hosts in an environment can go to non-responsive state in vCenter with Admission failure in path: ssh/python

search cancel

ESXi Hosts in an environment can go to non-responsive state in vCenter with Admission failure in path: ssh/python

book

Article ID: 318439

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

Avoid ESXi host to go into not responding status.

Symptoms:

ESXi hosts can go to non-responsive state in vCenter
In some occasions, they still are responsive, but some memory errors are seeing when executing esxcli commands, eg

esxcli esxcli command list
CRITICAL:root:Exception:
CRITICAL:root:Traceback (most recent call last):
File "/bin/esxcli", line 46, in <module>
from pyVmomi import vmodl, vim, SoapAdapter, VmomiSupport, Cache,
MemoryError

In vmkernel.log have admission failure in path: ssh/python messages

[YYYY-MM-DDTHH:MM:SS] cpu2:4918545)MemSched: 14635: Admission failure in path: ssh/python.4918545/uw.4918545
[YYYY-MM-DDTHH:MM:SS] cpu2:4918545)MemSched: 14642: uw.4918545 (33688803) extraMin/extraFromParent: 64/64, ssh (748) childEmin/eMinLimit: 204785/204800
[YYYY-MM-DDTHH:MM:SS] cpu2:4918545)MemSched: 14635: Admission failure in path: ssh/python.4918545/uw.4918545
[YYYY-MM-DDTHH:MM:SS] cpu2:4918545)MemSched: 14642: uw.4918545 (33688803) extraMin/extraFromParent: 64/64, ssh (748) childEmin/eMinLimit: 204785/204800
[YYYY-MM-DDTHH:MM:SS] cpu2:4918545)MemSched: 14635: Admission failure in path: ssh/python.4918545/uw.4918545
[YYYY-MM-DDTHH:MM:SS] cpu2:4918545)MemSched: 14642: uw.4918545 (33688803) extraMin/extraFromParent: 64/64, ssh (748) childEmin/eMinLimit: 204785/204800

In vmkwarning.log we can see messages similar to the following

[YYYY-MM-DDTHH:MM:SS] cpu2:2757371)WARNING: MemSched: 11696: Group vsanperfsvc: Requested memory limit 0 KB insufficient to support effective reservation 7500 KB
[YYYY-MM-DDTHH:MM:SS] cpu10:2760381)WARNING: UserSocketInet: 2244: python: waiters list not empty!
[YYYY-MM-DDTHH:MM:SS] cpu21:2790677)WARNING: LinuxThread: 381: python: Error cloning thread: -28 (bad0081)
[YYYY-MM-DDTHH:MM:SS] cpu39:2791235)WARNING: UserParam: 1326: busybox: could not change group to <host/vim/vimuser/terminal/ssh>: Admission check failed for memory resource
[YYYY-MM-DDTHH:MM:SS] cpu39:2791235)WARNING: LinuxFileDesc: 6270: busybox: Unrecoverable exec failure: Failure during exec while original state already lost

CPU usage in the host is very high. e.g.Running uptime shows the load average very high

[root@ESXi-FQDN:~]uptime
12:11:08 up 14 days, 21:28:56, load average: 0.94, 0.95, 0.95

Note: The preceding log excerpts are only examples. Date, time and environmental variables may vary depending on your environment.

Environment

VMware vSphere ESXi 6.5.x
VMware vSphere ESXi 6.7.x

Cause

By default, the hostd service retains completed tasks for 10 minutes.
If too many tasks come at the same time, for instance calls to get the current system time from the ServiceInstance managed object, hostd might not be able to process them all and fail with an out of memory message.

Resolution

This issue is resolved in VMware vSphere 6.5 Patch ESXi650-202007001.To download go to the Customer Connect Patch Downloads page.
This issue is resolved in VMware vSphere 6.7 Patch ESXi670-202008001.To download go to the Customer Connect Patch Downloads page.

Workaround:

To workaround this issue

Make fewer such vim.ServiceInstance.* calls in quick succession.
Lower the value of the taskRetentionInMins option in hostd's /etc/vmware/hostd/config.xml (default is 10 minutes) by following below steps
- /etc/init.d/hostd stop
- edit /etc/vmware/hostd/config.xml
- Before: 
- After: <taskRetentionInMins> 5 </taskRetentionInMins>
- /etc/init.d/hostd start

Additional Information

Impact/Risks: None

Feedback

thumb_up Yes

thumb_down No