ESXi Hosts in an environment can go to non-responsive state in vCenter with Admission failure in path: ssh/python
search cancel

ESXi Hosts in an environment can go to non-responsive state in vCenter with Admission failure in path: ssh/python

book

Article ID: 318439

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

Avoid ESXi host to go into not responding status.

Symptoms:

  • ESXi hosts can go to non-responsive state in vCenter
  • In some occasions, they still are responsive, but some memory errors are seeing when executing esxcli commands, eg
esxcli esxcli command list
CRITICAL:root:Exception:
CRITICAL:root:Traceback (most recent call last):
   File "/bin/esxcli", line 46, in <module>
    from pyVmomi import vmodl, vim, SoapAdapter, VmomiSupport, Cache,
MemoryError
  • In vmkernel.log have admission failure in path: ssh/python messages

[YYYY-MM-DDTHH:MM:SS] cpu2:4918545)MemSched: 14635: Admission failure in path: ssh/python.4918545/uw.4918545
[YYYY-MM-DDTHH:MM:SS] cpu2:4918545)MemSched: 14642: uw.4918545 (33688803) extraMin/extraFromParent: 64/64, ssh (748) childEmin/eMinLimit: 204785/204800
[YYYY-MM-DDTHH:MM:SS] cpu2:4918545)MemSched: 14635: Admission failure in path: ssh/python.4918545/uw.4918545
[YYYY-MM-DDTHH:MM:SS] cpu2:4918545)MemSched: 14642: uw.4918545 (33688803) extraMin/extraFromParent: 64/64, ssh (748) childEmin/eMinLimit: 204785/204800
[YYYY-MM-DDTHH:MM:SS] cpu2:4918545)MemSched: 14635: Admission failure in path: ssh/python.4918545/uw.4918545
[YYYY-MM-DDTHH:MM:SS] cpu2:4918545)MemSched: 14642: uw.4918545 (33688803) extraMin/extraFromParent: 64/64, ssh (748) childEmin/eMinLimit: 204785/204800

  • In vmkwarning.log we can see messages similar to the following

[YYYY-MM-DDTHH:MM:SS] cpu2:2757371)WARNING: MemSched: 11696: Group vsanperfsvc: Requested memory limit 0 KB insufficient to support effective reservation 7500 KB
[YYYY-MM-DDTHH:MM:SS] cpu10:2760381)WARNING: UserSocketInet: 2244: python: waiters list not empty!
[YYYY-MM-DDTHH:MM:SS] cpu21:2790677)WARNING: LinuxThread: 381: python: Error cloning thread: -28 (bad0081)
[YYYY-MM-DDTHH:MM:SS] cpu39:2791235)WARNING: UserParam: 1326: busybox: could not change group to <host/vim/vimuser/terminal/ssh>: Admission check failed for memory resource
[YYYY-MM-DDTHH:MM:SS] cpu39:2791235)WARNING: LinuxFileDesc: 6270: busybox: Unrecoverable exec failure: Failure during exec while original state already lost

  • CPU usage in the host is very high. e.g.Running uptime shows the load average very high

[root@ESXi-FQDN:~]uptime
12:11:08 up 14 days, 21:28:56, load average: 0.94, 0.95, 0.95
 

Note: The preceding log excerpts are only examples. Date, time and environmental variables may vary depending on your environment.

Environment

VMware vSphere ESXi 6.5.x
VMware vSphere ESXi 6.7.x

Cause

  • By default, the hostd service retains completed tasks for 10 minutes.
  • If too many tasks come at the same time, for instance calls to get the current system time from the ServiceInstance managed object, hostd might not be able to process them all and fail with an out of memory message.

Resolution

This issue is resolved in VMware vSphere 6.5 Patch ESXi650-202007001.To download go to the Customer Connect Patch Downloads page.
This issue is resolved in VMware vSphere 6.7 Patch ESXi670-202008001.To download go to the Customer Connect Patch Downloads page.

Workaround:
To workaround this issue
  • Make fewer such vim.ServiceInstance.* calls in quick succession.
  • Lower the value of the taskRetentionInMins option in hostd's /etc/vmware/hostd/config.xml (default is 10 minutes) by following below steps
    • /etc/init.d/hostd stop
    • edit  /etc/vmware/hostd/config.xml
    • Before: <!-- <taskRetentionInMins> 10 </taskRetentionInMins> -->
    • After: <taskRetentionInMins> 5 </taskRetentionInMins>
    • /etc/init.d/hostd start


Additional Information

Impact/Risks: None