Unexpected ESXi Reboot or Shutdown
search cancel

Unexpected ESXi Reboot or Shutdown

book

Article ID: 317245

calendar_today

Updated On:

Products

VMware vSphere ESXi VMware vSphere ESXi 7.0 VMware vSphere ESXi 8.0

Issue/Introduction

  • ESXi host abruptly rebooted
  • ESXi host abruptly powered off
  • Unexpected reboot of host
  • ESXi host crashed and rebooted

Environment

VMware vSphere ESXi 9.X
VMware vSphere ESXi 8.X
VMware vSphere ESXi 7.X

Cause

ESXi host reboot or shutdown reasons can be grouped into three broad categories: 

  1. User initiated - CLI, UI, DCUI, IPMI, etc..
  2. Kernel crash - PSOD's (MCE's, NMI's, software faults, etc..)
  3. Unknown reason to ESXi - Typically related to hardware (power outage, faulty components, etc..)

Resolution

To isolate which category of reboot or shutdown start with the vmksummary.log on ESXi. The vmksummary log will record an entry every hour at the top of the hour. It also logs useful information to determine unexpected reboot or shutdown information. Use the following steps to determine which category the unexpected reboot falls into:

  1. User initiated - CLI, UI, DCUI, IPMI, etc 

    If the reboot or shutdown of ESXi was user initiated the following messages will be in the vmksummary.log /var/log/vmksummary.log :

    In the example below, ESXi host has a user initiated shutdown  

    [YYYY-MM-DDTHH:MM:SS].360Z bootstop[137121]: Host is halting
    [YYYY-MM-DDTHH:MM:SS].360Z Host has booted

    In the example below, ESXi host has a user initiated reboot 

    [YYYY-MM-DDTHH:MM:SS] bootstop[137121]: Host is rebooting
    [YYYY-MM-DDTHH:MM:SS].360Z Host has booted

    ESXi may also experience a user initiated shutdown or reboot from out-of-band management tools such as iLO or iDRAC. Depending on the hardware setup and type of shutdown or reboot, ACPI events may be sent to ESXi. In these cases, vmksummary.log will not show a halting or reboot message. Instead, the below log message will be printed in /var/run/log/vmkernel.log 

    [YYYY-MM-DDTHH:MM:SS].360Z cpu0:2089432)VMKAcpi: 250: Power button pressed; requesting graceful shutdown and poweroff

    To check the user that initiated the reboot, see the Additional Information Section below.

  2. Kernel Crash PSOD's (MCE's, NMI's, software faults, etc) 

    If ESXi has a kernel crash the following messages will appear in the vmksummary.log /var/log/vmksummary.log

    [YYYY-MM-DDTHH:MM:SS].360Z heartbeat[14118500]: up 22d1h40m22s, 10 VMs; [[2109036 vmx 16752988kB] [2115283 vmx 16775168kB] [2112238 vmx 44607488kB]] []
    [YYYY-MM-DDTHH:MM:SS].360Z heartbeat[14118500]: up 22d1h40m22s, 10 VMs; [[2109036 vmx 16752988kB] [2115283 vmx 16775168kB] [2112238 vmx 44607488kB]] []
    [YYYY-MM-DDTHH:MM:SS].360Z bootstop[137121]: file core dump found
    [YYYY-MM-DDTHH:MM:SS].360Z Host has booted

    If the VMware ESXi host has experienced a kernel error, see Interpreting an ESXi host purple diagnostic screen

  3. Unknown reason to ESXi - Typically related to hardware (power outage, faulty components, etc..) 

    If the shutdown or reboot cannot be determined by ESXi and is not user initiated or the result of a PSOD then only the Host has booted message will appear in vmksummary.log. as below

    [YYYY-MM-DDTHH:MM:SS].244Z heartbeat[14108070]: up 22d0h40m22s, 10 VMs; [[2109036 vmx 16752844kB] [2115283 vmx 16775168kB] [2112238 vmx 44607488kB]] []
    [YYYY-MM-DDTHH:MM:SS]360Z heartbeat[14118500]: up 22d1h40m225, 10 VMs; [[2109036 vmx 16752988kB] [2115283 vmx 16775168kB] [2112238 vmx 44607488kB]] []
    [YYYY-MM-DDTHH:MM:SS].208Z bootstop[2106381]: Host has booted
    [YYYY-MM-DDTHH:MM:SS].462Z heartbeat[2115051]: up 0d0h39m40s, 0 VM; [[2099566 hostd 75272kB] [2113330 vsanmgmtd 79344kB] [2107596 vmx 131072kB]] []

    If the ESXi host experiences an outage that is not the result of a user initiated reboot, shutdown, or kernel error then the physical hardware may have abruptly restarted on its own. Hardware may reboot abruptly due to power outages, faulty components, and heating issues.

    To investigate further, engage the hardware vendor.

Additional Information

User initiated - CLI, UI, DCUI, IPMI, etc.

To confirm a reboot/restart was user initiated with the vSphere client follow the steps below 
    • Login to the vSphere Client
    • Select the host the rebooted
    • Select Monitor
    • Select Tasks
 
This information can also be found in the ESXi logs in /var/log/hostd.log:
 
[YYYY-MM-DDTHH:MM:SS] In(###) Hostd[######]: [Originator@#### sub=Vimsvc.TaskManager opID=####-####:####-#### sid=######## user=vpxuser:####-####] Task Created : haTask-ha-host-vim.HostSystem.reboot-##########
[YYYY-MM-DDTHH:MM:SS] In(###) Hostd[######]: -->    eventTypeId = "esx.audit.hostd.host.reboot.reason",
[YYYY-MM-DDTHH:MM:SS] In(###) Hostd[######]: -->          value = "The host is being rebooted through hostd."
[YYYY-MM-DDTHH:MM:SS] In(###) Hostd[######]: [Originator@#### sub=Vimsvc.ha-eventmgr] Event 7142 : The host is being rebooted through hostd. Reason for reboot: The host is being rebooted through hostd., User: vpxuser:<vCenter admin user>.


Kernel Crash PSOD's (MCE's, NMI's, software faults, etc..) 

 
Automated Server Recovery (ASR) is a Server BIOS setting that can be enabled to reboot ESXi when there is an issue detected (unresponsive OS, hardware malfunctioning, etc..)
If this setting is enabled, a PSOD will not be generated on ESXi host as it requires time to write to disk.
It is recommended to leave the host in an unresponsive state for debugging purposes.
 
To leave the ESXi host in an unresponsive state ASR should be disabled. Read more at  HP Server settings for ASR  & DELL Server settings for ASR
 
Steps to disable ASR in DELL iDRAC (Check with Dell for latest steps on this procedure) :
  • Log on to the iDRAC with the Admin account
  • Click on "iDRAC Settings" tab
  • Click on the "Services" tab
  • Go to Automated System Recovery Agent
  • Change the settings to "Disable"

Unknown reason to ESXi - Typically related to hardware (power outage, faulty components, etc..)

If the reboot reason is unknown it can be beneficial to check information sent to ESXi from the IPMI controller. To check this information, run the following command on the ESXi shell 

  • localcli hardware ipmi sel list
    Please note this information will also appear in the out-of-band management tool (iLO, iDrac, etc..) for the ESXi host. Engage the hardware vendor for further clarification on IPMI messages.