Symptoms:
ESXi host become unresponsive.
Host will come back accessible after the reboot.
/var/run/log/vmkernel.log on ESXi hosts is filled with the Admission failure events from 'nvidia-smi' module as below:
YYYY-MM-DDTHH:MM:SS.Z cpu72:274936160)Admission failure in path: host/vim/vimuser/terminal/ssh:nvidia-smi.274936232:uw.274936232
YYYY-MM-DDTHH:MM:SS.Z cpu72:274936160)UserWorld 'nvidia-smi' with cmdline 'unknown'
YYYY-MM-DDTHH:MM:SS.Z cpu72:274936160)uw.274936232 (2403560232) extraMin/extraFromParent: 3059/3059, ssh (672) childEmin/eMinLimit: 203341/204800
YYYY-MM-DDTHH:MM:SS.Z cpu72:274936160)WARNING: LinuxThread: 424: nvidia-smi: Error cloning thread: -28 (bad0081)
YYYY-MM-DDTHH:MM:SS.Z cpu72:274936160)Admission failure in path: host/vim/vimuser/terminal/ssh:nvidia-smi.274936233:uw.274936233
YYYY-MM-DDTHH:MM:SS.Z cpu72:274936160)UserWorld 'nvidia-smi' with cmdline 'unknown'
YYYY-MM-DDTHH:MM:SS.Z cpu72:274936160)uw.274936233 (2403560241) extraMin/extraFromParent: 3059/3059, ssh (672) childEmin/eMinLimit: 203352/204800
YYYY-MM-DDTHH:MM:SS.Z cpu72:274936160)WARNING: LinuxThread: 424: nvidia-smi: Error cloning thread: -28 (bad0081)
..
..
YYYY-MM-DDTHH:MM:SS.Z cpu16:2105810)Admission failure in path: host/vim/vmvisor/NVIDIAHost:nv-hostengine.274936450:uw.274936450
YYYY-MM-DDTHH:MM:SS.Z cpu16:2105810)UserWorld 'nv-hostengine' with cmdline 'unknown'
YYYY-MM-DDTHH:MM:SS.Z cpu16:2105810)uw.274936450 (2403562149) extraMin/extraFromParent: 13643/13643, NVIDIAHost (39528) childEmin/eMinLimit: 20349/32768
YYYY-MM-DDTHH:MM:SS.Z cpu16:2105810)WARNING: LinuxThread: 424: nv-hostengine: Error cloning thread: -28 (bad0081)
YYYY-MM-DDTHH:MM:SS.Z cpu97:274934361)WARNING: Heap: 3898: Could not allocate 102400 bytes for dynamic heap vsansparse. Request returned Out of memory (ok to retry)
YYYY-MM-DDTHH:MM:SS.Z cpu97:274934361)WARNING: Heap: 4109: Heap_Align(vsansparse, 98776/98776 bytes, 8 align) failed. caller: 0x420018fb3580
YYYY-MM-DDTHH:MM:SS.Z cpu97:274934361)WARNING: Heap: 3898: Could not allocate 102400 bytes for dynamic heap vsansparse. Request returned Out of memory (ok to retry)