Symptoms:
The scope of this document is only to troubleshoot ESXi Server hostd unresponsiveness and the data that needs to be gathered for further analysis for investigation.
VMware vSphere ESXi 7.x
VMware vSphere ESXi 8.x
hostd:
hostd detected to be non-responsive" alert message in the vmkernel* logs.hostd-probe* logs and locate timeout messages or hostd log not getting updated.yyyy-mm-ddThh:mm:ss.369Z info hostd-probe[9179840] [Originator@6876 sub=Default] Logging uses fast path: trueyyyy-mm-ddThh:mm:ss.369Z info hostd-probe[9179840] [Originator@6876 sub=Default] The bora/lib logs WILL be handled by VmaCoreyyyy-mm-ddThh:mm:ss.369Z info hostd-probe[9179840] [Originator@6876 sub=Default] Initialized channel manageryyyy-mm-ddThh:mm:ss.369Z info hostd-probe[9179840] [Originator@6876 sub=Default] Current working directory: /var/log/vmwareyyyy-mm-ddThh:mm:ss.369Z info hostd-probe[9179840] [Originator@6876 sub=FairScheduler] Priority level 4 is now active.yyyy-mm-ddThh:mm:ss.369Z info hostd-probe[9179840] [Originator@6876 sub=FairScheduler] Priority level 8 is now active.yyyy-mm-ddThh:mm:ss.369Z info hostd-probe[9179840] [Originator@6876 sub=FairScheduler] Priority level 16 is now active.yyyy-mm-ddThh:mm:ss.369Z info hostd-probe[9179840] [Originator@6876 sub=Default] Syscommand enabled: trueyyyy-mm-ddThh:mm:ss.369Z info hostd-probe[9179840] [Originator@6876 sub=Default] ReaperManager Initializedyyyy-mm-ddThh:mm:ss.369Z info hostd-probe[9179840] [Originator@6876 sub=Default] Current process ID: 247108yyyy-mm-ddThh:mm:ss.369Z warning hostd-probe[9179840] [Originator@6876 sub=Default] Timeout: N7Vmacore16TimeoutExceptionE(Operation timed out)--> [context]zKq7AUoCAgAAAItAagAMaG9zdGQtcHJvYmUAAL/FLWxpYnZtYWNvcmUuc28AADJaEgAP0A0AYPkVAWpFDWxpYnZtb21pLnNvAAFsSw0BwIUPAjmsxmxpYnZpbS10eXBlcy5zbwADXkEAaG9zdGQtcHJvYmUAA68yAARniwFsaWJjLnNvLjYAA2k2AA==[/context]
hostd detected to be non-responsive
vim-cmd /vmsvc/getallvms status may not give any output.hostd service or services.shhostd using service or hostd command. For more information, see:
# localcli vm process list # localcli vm process kill -t soft -w <worldID> soft', as above, is the most graceful shutdown. If that doesn't work, use 'hard' instead to perform an immediate shutdown. The option 'force' should be used as a last resort.hostd service to respond properly.hostd dump from memory by running this on the host: vmkbacktrace -n hostd -c -w ls -alrth /var/core/hostd* *looks like: rwx------ 1 root root 32.8M Aug 15 05:10 /var/core/hostd-worker-zdump.001 WinSCP, Filezilla, etc., and download the file. hostd live core (hostd-worker-zdump.*) run this command. vmkbacktrace -n hostd -c -wvmkernel-zdump without affecting the running VMs:The core dump will then be gathered. This process can take some time, as it does during a PSOD. When the process is completed, you will be returned to the command prompt.
Once the core dump has been collected and the process is finished, gather a vm-support bundle to collect the logging, system state and livecore for root cause analysis.