Symptoms:
The scope of this document is only to troubleshoot ESXi Server hostd unresponsiveness and the data that needs to be gathered for further analysis for investigation.
VMware vSphere ESXi 7.x
VMware vSphere ESXi 8.x
hostd
:
hostd detected to be non-responsive"
alert message in the vmkernel
* logs.host-probe*
logs and locate timeout messages or hostd
log not getting updated.yyyy-mm-ddThh:mm:ss.369Z info hostd-probe[9179840] [Originator@6876 sub=Default] Logging uses fast path: true
yyyy-mm-ddThh:mm:ss.369Z info hostd-probe[9179840] [Originator@6876 sub=Default] The bora/lib logs WILL be handled by VmaCore
yyyy-mm-ddThh:mm:ss.369Z info hostd-probe[9179840] [Originator@6876 sub=Default] Initialized channel manager
yyyy-mm-ddThh:mm:ss.369Z info hostd-probe[9179840] [Originator@6876 sub=Default] Current working directory: /var/log/vmware
yyyy-mm-ddThh:mm:ss.369Z info hostd-probe[9179840] [Originator@6876 sub=FairScheduler] Priority level 4 is now active.
yyyy-mm-ddThh:mm:ss.369Z info hostd-probe[9179840] [Originator@6876 sub=FairScheduler] Priority level 8 is now active.
yyyy-mm-ddThh:mm:ss.369Z info hostd-probe[9179840] [Originator@6876 sub=FairScheduler] Priority level 16 is now active.
yyyy-mm-ddThh:mm:ss.369Z info hostd-probe[9179840] [Originator@6876 sub=Default] Syscommand enabled: true
yyyy-mm-ddThh:mm:ss.369Z info hostd-probe[9179840] [Originator@6876 sub=Default] ReaperManager Initialized
yyyy-mm-ddThh:mm:ss.369Z info hostd-probe[9179840] [Originator@6876 sub=Default] Current process ID: 247108
yyyy-mm-ddThh:mm:ss.369Z warning hostd-probe[9179840] [Originator@6876 sub=Default] Timeout: N7Vmacore16TimeoutExceptionE(Operation timed out)
--> [context]zKq7AUoCAgAAAItAagAMaG9zdGQtcHJvYmUAAL/FLWxpYnZtYWNvcmUuc28AADJaEgAP0A0AYPkVAWpFDWxpYnZtb21pLnNvAAFsSw0BwIUPAjmsxmxpYnZpbS10eXBlcy5zbwADXkEAaG9zdGQtcHJvYmUAA68yAARniwFsaWJjLnNvLjYAA2k2AA==[/context]
hostd detected to be non-responsive
vim-cmd /vmsvc/getallvms status
may not give any output.hostd
service or services.sh
hostd
using service or hostd command. For more information, see:
# localcli vm process list
# localcli vm process kill -t soft -w <worldID>
soft
', as above, is the most graceful shutdown. If that doesn't work, use 'hard'
instead to perform an immediate shutdown. The option 'force
' should be used as a last resort.hostd
service to respond properly.hostd
dump from memory by running this on the host: vmkbacktrace -n hostd -c -w
ls -alrth /var/core/hostd*
*looks like: rwx------ 1 root root 32.8M Aug 15 05:10 /var/core/hostd-worker-zdump.001
WinSCP, Filezilla, etc.
, and download the file. hostd
live core (hostd-worker-zdump.
*) run this command. vmkbacktrace -n hostd -c -w
vmkernel-zdump
without affecting the running VMs localcli --plugin-dir /usr/lib/vmware/esxcli/int/ debug livedump perform
esxcfg-dumppart -C -D active
The core dump will then be gathered. This process can take some time, as it does during a PSOD. When the process is completed, you will be returned to the command prompt.
Once the core dump has been collected and the process is finished, gather a vm-support
bundle to collect the logging, system state and livecore for root cause analysis.
Open a Broadcom Support case with data captured.