Troubleshooting a single virtual machine failure on an ESXi host
search cancel

Troubleshooting a single virtual machine failure on an ESXi host

book

Article ID: 309782

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

This article guides readers through the process of troubleshooting a single virtual machine that is failing on an ESXi host. It is primarily suited for addressing repetitive failures when the cause is unknown. If the failure is reproducible—meaning it can be repeated by following a sequence of steps—readers should follow the instructions at the end of this knowledge base article to gather the support script data and file a support request. If the failure is a one-off, gathering the support script data and filing a support request is recommended; include as much information about the environment and what was happening at the time of the failure, as possible. Virtual machine failures may be caused by factors outside of VMware and the cause is not always evident from the support script data.
  • The guest operating system has terminated unexpectedly
  • The virtual machine is not accessible
  • A blue screen with a Stop error code may be visible on the console
  • An error including the term kernel panic is visible on the console
  • Errors similar to:
    • BAD_POOL_HEADER
    • KMODE_EXCEPTION_NOT_HANDLED
    • PAGE_FAULT_IN_NONPAGED_AREA
    • STOP: 0x00000050 (0xFFFFFFF8,0x00000000,0xF9CF5C88,0x00000000)
    • STOP: 0x00000019 (0x00000000,0xC00E0FF0,0xFFFFEFD4,0xC0000000)
    • Unknown inaccessible
    • SCSI: 4506: Cannot find a path to device vmhbax:x:x in a good state
    • WARNING: LVM: 4844: vmhbaH:T:L:P detected as a snapshot device. Disallowing access to the LUN since resignaturing is turned off.
    • Date esx vmkernel: Time cpu3: 10340 SCSI: 5637: status SCSI LUN is in snapshot state, rstatus 0xc0de00 for vmhbax:x:x. residual R 999, CR 8-, ER3
    • Date esx vmkernel: Time cpu3: world ID SCSI 6624: Device vmhbax:x:x. is a deactivated snapsho

 

Resolution

Validate that each troubleshooting step below is true for the environment. Each step will provide instructions or a link to a document, in order to eliminate possible causes and take corrective action as necessary. The steps are ordered in the most appropriate sequence to isolate the issue and identify the proper resolution. Do not skip a step.
  1. Verify that the virtual machine is not in an unresponsive state.

    During an unresponsive state, the operating system seems to be paralyzed, no error messages are displayed, and the screen freezes or the application does not respond to users' actions. Keyboard input or mouse clicking has no effect, regardless of where the cursor is placed, but the operating system is still running. Unlike a failure, sometimes an unresponsive system resolves itself, and the application resumes its normal execution without user involvement.

    A failure is a situation where the operating system has terminated and is no longer running. There may be a diagnostic screen or error message visible in its place.

    Note: There is a difference between a virtual machine failing and the guest operating system failing. If the virtual machine fails, it powers off and vmware-core files may have been created in the virtual machines host directory. Checking the vmware.log file, the following entries may appear:

    Sep 13 19:58:46: vcpu-1| MONITOR PANIC: ASSERT failed
    Sep 13 19:58:46: vcpu-1| Core dump with build build-10104
    Sep 13 19:58:46: vcpu-1| Writing monitor corefile
    "/root/vmware/vm1/vmware-core0.gz"|

  2. Verify that the guest operating system is fully certified for the ESXi host version.

    If the guest operating system is not listed, the following steps may help to resolve the issue, but be aware that problems may be encountered in an uncertified guest operating system.

  3. Verify that access to the storage hosting the virtual machine is available.

    Virtual machines may fail if the LUN on which it is stored becomes unavailable.

    To check this:
    1. SSH to the ESXi host via root
    2. Navigate to the working directory of the VM

      Example:
      cd /vmfs/volumes/46b2f3eb-ced4c7d8-c1d2-111122223333/vm1/

    3. If the files associated with the virtual machine (VMDK, VMX, NVRAM) are listed, there is working access to the storage hosting the virtual machine.

      If not, refer to Identifying Fibre Channel, iSCSI, and NFS storage issues on ESX/ESXi hosts

  4. Verify no software changes have been made that may have caused the failure. For more information, see Identifying critical Guest OS failures within virtual machines

  5. Verify no hardware changes have been made that may have caused the failure. If recent changes have occurred to the virtual machine's hardware configuration, back them out temporarily for testing purposes. For more information, see Verifying the Virtual Hardware configuration of a virtual machine.