Virtual machine is unresponsive and Event IDs 11 and 15 occur in the Windows guest operating system Event log

Products

VMware vSphere ESXi

Issue/Introduction

A virtual machine is unresponsive
You see a blue screen error
The ESX host VMkernel or VMkwarning logs show an alert similar to:

May 6 23:17:55 server1 VMkernel: 0:00:05:12.852 cpu2:1035)ALERT: LVM: 1366: One or more devices not found (file system [MY-VOL1, 48204a32-########-####-##########7])
The Event log of the Windows virtual machine guest operating system shows these errors:

Event Type: Error
Event Source: Disk
Event Category: None
Event ID: 11
Date: 5/6/2008
Time: 3:12:28 PM
User: N/A
Computer: W2K3-VM1
Description:
The driver detected a controller error on \Device\Harddisk1.

Event Type: Error
Event Source: symmpi
Event Category: None
Event ID: 15
Date: 5/6/2008
Time: 3:12:28 PM
User: N/A
Computer: W2K3-VM1
Description:
The device, \Device\Scsi\symmpi1, is not ready for access yet.

Environment

VMware ESXi 4.0.x Embedded
VMware ESX Server 3.0.x
VMware ESX Server 3.5.x
VMware ESXi 3.5.x Installable
VMware ESXi 3.5.x Embedded
VMware ESX 4.0.x

Resolution

About Event IDs 11 and 15

Event IDs 11 and 15 are documented by Microsoft are related to a storage device, a storage driver, or both.Event IDs 11 and 15 may occur in a Windows guest operating system when the guest operating system virtual machine is stored on a spanned VMFS volume that has one or more missing span members.

The virtual machine VMDK disk files are stored on a VMFS volume. If this VMFS volume is corrupt for any reason, the VMDK files may be corrupt as well. You can check the properties of a spanned VMFS volume through VMware Infrastructure (VI) Client. You can also check the properties of a spanned VMFS volume from a command line by running the command:

#vmkfstools -P /vmfs/volumes/MY-VOL1

The output appears similar to:

VMFS-3.31 file system spanning 2 partitions.
File system label (if any): MY-VOL1
Mode: public
Capacity 25232932864 (24064 file blocks * 1048576), 1657798656 (1581 blocks) avail
UUID: <UUID>
Partitions spanned (on "lvm"):
vmhba1:1:0:1
vmhba1:2:1:1

Note: With ESX/ESXi 4.x and a storage array that supports Network Address Authority (NAA) the output will display the NAA ID of the Span and not the VMHBA Device

This output indicates a healthy spanned volume.

If one of the span volume members is missing, the integrity of the entire VMFS volume is compromised. The output of the#vmkfstools -P /vmfs/volumes/MY-VOL1 command for a compromised VMFS volume appears similar to:

VMFS-3.31 file system spanning 1 partitions.
File system label (if any): MY-VOL1
Mode: public
Capacity 25232932864 (24064 file blocks * 1048576), 1657798656 (1581 blocks) avail
UUID: <UUID>
Partitions spanned (on "lvm"):
vmhba1: 1:0:1
(One or more partitions spanned by this volume may be offline)

Note: The output (One or more partitions spanned by this volume may be offline) indicates that at least one of the Disks/LUNs that makes up the datastore is missing. Again, with ESX/ESXi 4.x and a storage array that supports Network Address Authority (NAA) the output will display the NAA ID of the Span and not the VMHBA Device.

Note: If the span member is located on a LUN that is inaccessible by the ESX host due to fiber connectivity issues, it may exhibit similar symptoms. For more information, see Troubleshooting LUN connectivity issues (1003955).

Resolution

To resolve this issue, follow these steps in order. If step 1 does solve the issue, proceed to step 2, and so on.

Try and findthe LUN(s) containing the missing member volumes and present them back to the ESX host:
1. Rescan for storage and VMFS volumes on all HBAs.
2. To update the primary spanned volume's metadata in memory, run the command:
  vmkfstools -V
3. Run this command and check the output to see if the volume has all of its span volume members:
  
  #vmkfstools -P /vmfs/volumes/MY-VOL1
  
  Note: Make sure that the LUNs containing the missing span members of the spanned VMFS volume are not detected as snapshots or resignatured as this may prevent the primary volume from properly re-assembling the missing span volume members.
If the output of the command states that One or more partitions spanned by this volume may be offline, you may be experiencing issues with the connectivity between the ESX host and the LUN which contains the extent for the datastore. To investigate this issue further, see Identifying shared storage issues with ESX 3.x (1003659) for ESX 3.x hosts and Lost connectivity to storage device (1009553) for ESX 4.0 hosts.
In the event that the connectivity issue between the ESX host and the LUN cannot be resolved or the information on the LUN cannot be recovered, the integrity of the entire VMFS volume is compromised. You may want to do these:
1. Select a datastore which has the capacity to store the virtual machines or create a new datastore if required. For more information, see the Installing and Setting Up ESXi for your version of ESX.
2. Migrate your virtual machines from the compromised spanned datastore to the new or alternate datastore.
3. Power on the virtual machines on the new or alternate datastore then run the Windows chkdsk disk utility on all drives inside the guest operating system.
4. Once all virtual machines have been moved away from the old spanned volume, destroy this old spanned volume and rebuild if necessary.
  
  Note: The migration of some virtual machines may fail if the virtual machines have disk files that are missing some data blocks. This is expected because these missing data blocks may be on the missing span volume members.
If you are unable to migrate a virtual machine that resides on the compromised datastore, you may need to restore the virtual machine or data from existing backups. Restoring from backup may be required under these circumstances:

The missing LUN cannot be found or has been destroyed.
The information on the extent has been corrupted.
Steps 1 to 3 do not resolve the issue.
You do not have any other recovery options available to you.