Determining why an ESXi host was powered off or restarted
search cancel

Determining why an ESXi host was powered off or restarted

book

Article ID: 317245

calendar_today

Updated On:

Products

VMware vSphere ESXi VMware vSphere ESXi 7.0 VMware vSphere ESXi 8.0 VMware vSphere ESXi 6.0

Issue/Introduction

This article provides steps to determine if an  ESXi host was powered off or restarted.

Symptoms:

  • An ESXi host is disabled (grayed out) and displays as Not Responding.

  • An ESXi host is disabled (grayed out) and displays as Disconnected.

  • Clients connected to services running in one or more virtual machines are no longer accessible.

  • Applications dependent on services running in one or more virtual machines are reporting errors.

  • One or more virtual machines are no longer responding to network connections.

  • ESXi host abruptly rebooted without any user intervention

  • ESXi host abruptly powered off without any user intervention  
    • Logs snippets as below will be observed in the /var/log/hostd.logs 

      YYYY-MM-DDT:HH:MM:06.810Z In(166) Hostd[2099849]: -->    eventTypeId = "esx.audit.host.poweroff.reason.unavailable",
      YYYY-MM-DDT:HH:MM.810Z In(166) Hostd[2099849]: -->    objectId = "ha-host",
      YYYY-MM-DDT:HH:MM.810Z In(166) Hostd[2099849]: -->    objectType = "vim.HostSystem",
      YYYY-MM-DDT:HH:MM.810Z In(166) Hostd[2099849]: --> }
      YYYY-MM-DDT:HH.810Z In(166) Hostd[2099894]: [Originator@6876 sub=Vimsvc.ha-eventmgr] Event 38 : Host had been powered off. The poweroff was not the result of a kernel error, deliberate reboot, or shut down. This could indicate a hardware issue. Hardware may reboot abruptly due to power outages, faulty components, and heating issues. To investigate further, engage the hardware vendor.

Environment

VMware vSphere ESXi 6.x

VMware vSphere ESXi 7.x

VMware vSphere ESXi 8.x

Cause

If the Operating system (ESXi) is running into an issue, the default behavior is to crash the ESXi Host with a PSOD, where the Physical CPU will hand over the task to the ESXi to trigger a PSOD.

If the ESXi host has not triggered a PSOD this will need further investigation over the logs and server settings to isolate what triggered the incident.
 
The Servers abrupt reboot or shutdown without user intervention could be due to Hardware Failures, BIOS settings with Automated server recovery, User Interaction, Physical Infrastructure related, Third party calls using API etc.
 
This article provides steps to isolate and narrow down on what triggered the Abrupt reboot / shutdown of the ESXi Hosts.

Resolution

Note: This article assumes that you have completed the steps described in ESXi hosts do not respond and is grayed out in vCenter
 

  1. To determine the reason for abrupt shut down or reboot of a VMware ESXi host:

    Note: By default, VMware ESXi logs do not persist upon a reboot. If a VMware ESXi host experiences an abrupt reboot due to reasons other than a VMkernel error, the logs do not persist and you do not have access to the logs prior to the reboot to determine the cause. The steps in this section assume that the VMware ESXi host is configured to redirect the logs to a location where the logs persist. For more information on how to configure a VMware ESXi host to redirect the logs to an alternate location, see Configuring syslog on ESXi 
     
    1. If the ESXi host is currently turned off, turn the host back on.
       
    2. Ensure that there are no hardware lights that may indicate a hardware issue. For more information, engage the hardware vendor.
       
    3. Determine where the logs are being redirected to:
       
      1. Open vSphere Client.
      2. Connect to the ESXi host or vCenter Server managing the ESXi host.
      3. Provide the credentials of an administrative user.
      4. Select the ESXi host in the Inventory.
      5. Click the Configuration tab.
      6. Click Advanced Settings.
      7. In the Advanced Settings dialog, verify the location where the log files are being redirected:

        Note: If either of these settings are not properly configured, then logs do not persist upon a reboot and may limit the amount of information that can be gathered for troubleshooting.
         
        • Syslog > Local > Syslog.Local.DatastorePath contains the location of the logs if they are redirected to a VMFS volume.
        • Syslog > Remote > Syslog.Remote.Hostname contains the IP address or hostname of the syslog server that houses the logs for this host.
           
    4. Navigate to the location of the log files, and based on the modified date of the files, open the log file using your preferred editor.
       
    5. Determine if the ESXi host was deliberately restarted. If an ESXi host was restarted deliberately, the /var/run/log/hostd.log file will contain events similar to these:
       
      • Hostd: [hh:mm:ss.284 27D13B90 info 'TaskManager'] Task Created : haTask-ha-host-vim.HostSystem.reboot-50

        or
         
      • DCUI: reboot

      Note: In ESXi 5.5 and above, these entries will be in /var/run/log/shell.log.

      If your host is deliberately shut down, review the vCenter Server logs to identify any recent tasks that may have made the host to power off.
       
    6. Determine if the ESXi host was deliberately shut down. If an ESXi server was shut down deliberately, it contains an event similar to:
       
      • Hostd: [<YYYY-MM-DD> <time>.550 2FEDEB90 info 'TaskManager'] Task Created : haTask-ha-host-vim.HostSystem.shutdown-78</time>

        or
         
      • DCUI: poweroff

      If your host is deliberately shut down, review the vCenter Server logs to identify any recent tasks that may have made the host to power off.


      ESXi 5.5 and above may also include PowerButton Helper events in the vmkernel.log file, similar to:

      [YYYY-MM-DDTHH:MM:SS] cpu6:8222)VMKAcpi: 217: In PowerButton Helper


      ESXi 7.0 and above may indicate power button related events in vmkernel.log and hostd.log files, similar to:

      vmkernel.log
      [YYYY-MM-DDTHH:MM:SS] cpu0:2101545)VMKAcpi: 256: Power button pressed; requesting graceful shutdown and poweroff

      hostd.log

      [YYYY-MM-DDTHH:MM:SS] info hostd[2100455] [Originator@6876 sub=Hostsvc.HaHost] ACPI power event from the vmkernel 
      [YYYY-MM-DDTHH:MM:SS] info hostd[2100455] [Originator@6876 sub=Hostsvc.HaHost] Shutdown, force = true  
      [YYYY-MM-DDTHH:MM:SS] warning hostd[2100455] [Originator@6876 sub=Hostsvc.HaHost] Failed to find activation record, event user unknown.
      [YYYY-MM-DDTHH:MM:SS] info hostd[2100455] [Originator@6876 sub=Vimsvc.ha-eventmgr] Event 679147 : Shut down of <ESXhostname> in ha-datacenter: Unknown  
      [YYYY-MM-DDTHH:MM:SS] info hostd[2100455] [Originator@6876 sub=SysCommandPosix] ForkExec(/usr/lib/vmware/vob/bin/addvob) 214058855
      [YYYY-MM-DDTHH:MM:SS] info hostd[2516901] [Originator@6876 sub=Vimsvc.TaskManager opID=2c8f4244 user=vpxuser] Task Created : haTask--vim.event.EventHistoryCollector.readNext-13046911
      [YYYY-MM-DDTHH:MM:SS] info hostd[161861312] [Originator@6876 sub=Vimsvc.TaskManager opID=2c8f4244 user=vpxuser] Task Completed : haTask--vim.event.EventHistoryCollector.readNext-13046911 Status success
      [YYYY-MM-DDTHH:MM:SS] info hostd[2516902] [Originator@6876 sub=Hostsvc.VmkVprobSource] VmkVprobSource::Post event: (vim.event.EventEx) {
      --> key = 65,  
      --> chainId = -1897761432,  
      --> createdTime = "[YYYY-MM-DDTHH:MM:SS]",
      --> userName = "",  
      --> host = (vim.event.HostEventArgument) {  
      --> name = "<ESX host name>",  
      --> host = 'vim.HostSystem:ha-host'  
      --> },  
      --> eventTypeId = "esx.audit.hostd.host.poweroff.reason",  
      --> arguments = (vmodl.KeyAnyValue) [  
      --> (vmodl.KeyAnyValue) {  
      --> key = "1",  
      --> value = " Last reboot reason is unknown. Please check ESXi and the BMC logs for further details. "  
      --> },  
      --> (vmodl.KeyAnyValue) {  
      --> key = "2",  
      --> value = "Unknown"  
      --> }  
      --> ],  
      --> objectId = "ha-host",  
      --> objectType = "vim.HostSystem",  
      --> }  
      [YYYY-MM-DDTHH:MM:SS] info hostd[2516902] [Originator@6876 sub=Vimsvc.ha-eventmgr] Event 679148 : The host is being powered off through hostd. Reason for powering off: Last reboot reason is unknown. Please check ESXi and the BMC logs for further details. , User: Unknown. Please consult vSphere Documentation Center or follow the Ask VMware link for more information.
      [YYYY-MM-DDTHH:MM:SS] info hostd[2100456] [Originator@6876 sub=Vimsvc.TaskManager opID=2c8f4245 user=vpxuser] Task Created : haTask--vim.event.EventHistoryCollector.readNext-13046912
      [YYYY-MM-DDTHH:MM:SS] info hostd[2516901] [Originator@6876 sub=Vimsvc.TaskManager opID=2c8f4245 user=vpxuser] Task Completed : haTask--vim.event.EventHistoryCollector.readNext-13046912 Status success
      [YYYY-MM-DDTHH:MM:SS] info hostd[2100455] [Originator@6876 sub=SysCommandPosix] ForkExec(/sbin/poweroff) 214058856

    7. Verify whether the virtual machine or ESXi host has generated a core dump:
       
      1. Log in to the ESXi Shell. For more information, see Using ESXi Shell in ESXi
         
      2. ESXi hosts do not automatically collect the core dumps. To collect the core dump, manually run the esxcfg-dumppart command. For more information, see Extracting a core dump file from the VMKCore diagnostic partition following a purple diagnostic screen error

        Note: Not configuring a core dump partition could interfere with the analysis of the abrupt reboots. For information on setting up a core dump partition, see Configuring an ESXi/ESX host to capture a VMkernel coredump from a purple diagnostic screen
         
      3. If your VMware ESXi host has experienced a kernel error, see Interpreting an ESX host purple diagnostic screen
         
    8. Check if ESXi is configured to automatically reboot after a purple screen by executing this command:

      esxcfg-advcfg -g /Misc/BlueScreenTimeout

      The default value is "0" and if the value is different than 0, then ESXi reboots automatically after the purple screen.

      NOTE:- There can be instances where the Server BIOS is also enabled with a feature called Automated Server recovery (ASR)

      If the option is set to Reboot on the BIOS level, then the CPU will simply reboot the ESXi Host when there is crash being detected and this will not generate a PSOD on the ESXi Host

      The ASR setting at the Server BIOS should disabled to trigger a PSOD. 
      HP Server vendor settings for ASR  & DELL Server vendor settings for ASR

      Steps to disable ASR in DELL iDRAC 

      1. Log on to the iDRAC with the Admin account 
      2. Click on "iDRAC Settings" tab 
      3. Click on the "Services" tab
      4. Go to Automated System Recovery Agent 
      5. Change the settings to "Disable"


      Note
      : The default and Broadcom recommended setting is to leave the host in an unresponsive state with the purple diagnostic screen displayed on the console screen to aid in troubleshooting.

      For more details: Configuring an ESX/ESXi host to restart after becoming unresponsive with a purple diagnostic screen

      When the host is rebooted after a crash and if the core dump was successful, the /var/log/vmksummary.log shows that a core dump is found.

      For example:
      <YYYY-MM-DD>T<time>Z bootstop: Host has booted
      <YYYY-MM-DD>T<time>Z bootstop: file core dump found</time></time>

      Note: This information does not necessarily means that ESXi restarted automatically but gives an indication when ESXi crashed.
       
    9. If your VMware ESXi host experiences an outage that is not the result of a kernel error, deliberate reboot, or shut down, then the physical hardware may have abruptly restarted on its own. Hardware may reboot abruptly due to power outages, faulty components, and heating issues. To investigate further, engage the hardware vendor.

      Alternatively, if an administrator has physically turned off or restarted the physical hardware because the console is not responding to user interaction, see Determining why an ESXi/ESX host does not respond to user interaction at the console (1017135).
       
       
    10. Direct Web-Client to VC and reboot from GUI, you will see the below messages in hostd.log

      [YYYY-MM-DDTHH:MM:SS] In(166) Hostd[525413]: [Originator@6876 sub=Vimsvc.TaskManager opID=m046ox1m-35067-auto-r25-h5:70002004-f6-f7-2d21 sid=520de624 user=vpxuser:VSPHERE.LOCAL\Administrator] Task Created : haTask-ha-host-vim.HostSystem.reboot-1668848948
      [YYYY-MM-DDTHH:MM:SS] In(166) Hostd[525377]: -->    eventTypeId = "esx.audit.hostd.host.reboot.reason",
      [YYYY-MM-DDTHH:MM:SS] In(166) Hostd[525377]: -->          value = "The host is being rebooted through hostd."
      [YYYY-MM-DDTHH:MM:SS] In(166) Hostd[525428]: [Originator@6876 sub=Vimsvc.ha-eventmgr] Event 7142 : The host is being rebooted through hostd. Reason for reboot: The host is being rebooted through hostd., User: vpxuser:VSPHERE.LOCAL\Administrator.

       
    11. Direct Web-Client to host and shutdown from GUI  you will see the below messages in hostd.log

      [YYYY-MM-DDTHH:MM:SS] info hostd[264466] [Originator@6876 sub=Vimsvc.TaskManager opID=esxui-c891-f8d4 user=root] Task Created : haTask-ha-host-vim.HostSystem.shutdown-1063352434
      -->          obj = 'vim.Task:haTask-ha-host-vim.HostSystem.shutdown-1063352434',
      -->    object = 'vim.Task:haTask-ha-host-vim.HostSystem.shutdown-1063352434',
      -->    eventTypeId = "esx.audit.host.stop.shutdown",
      [YYYY-MM-DDTHH:MM:SS] info hostd[264471] [Originator@6876 sub=Vimsvc.TaskManager opID=esxui-c891-f8d4 user=root] Task Completed : haTask-ha-host-vim.HostSystem.shutdown-1063352434 Status success
      [YYYY-MM-DDTHH:MM:SS] info hostd[264022] [Originator@6876 sub=Solo.VmwareCLI] (vim.EsxCLI.system.shutdown) ha-cli-handler-system-shutdown created
      [YYYY-MM-DDTHH:MM:SS] info hostd[264022] [Originator@6876 sub=Solo.VmwareCLI] CreateDynMoType (Type vim.EsxCLI.system.shutdown) (Wsdl VimEsxCLIsystemshutdown) (Version vim.version.version5).



Additional Information