Recovering powered on virtual machines in a device that has encountered a PDL condition

search cancel

Recovering powered on virtual machines in a device that has encountered a PDL condition

book

Article ID: 308204

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

This article provides an overview of what a Permanent Device Loss (PDL) is and the steps to power on virtual machines that are located on a device that has encountered a Permanent or Transient PDL condition.

Symptoms:

A device with powered on virtual machines has encountered a PDL condition.

Environment

VMware vSphere ESXi 5.5
VMware vSphere ESXi 5.1
VMware vSphere ESXi 5.0

Resolution

Permanent PDL is the action that results when you modify the storage array configuration by removing, masking, or unmasking LUNs or by bringing down a path or a component in the path. It may also be caused by storage manager actions such as array lease rotation, rebalancing capacity/workload retiring storage, or maintaining components in the path.

Transient PDL is the action that results due to changes in the storage that are not triggered by administrative actions. These are unusual events and, in general, are not common routines. For example, a dead interface such as Network switch, HBA, or array, or storage network reconfiguration.

To recover powered up virtual machines on a device that has been affected by PDL to the device/LUN:

Run this command to list the virtual machines and identify the virtual machines that are to be powered on:

esxcli vm process list

You see an output similar to:

RH4U5-101VM
World ID: 8339
Process ID: 0
VMX Cartel ID: vm_cartel_ID

Where vm_cartel_ID is the cartel ID of the virtual machine to be powered on.
Power off the virtual machine from the vSphere Client or the vSphere Web Client. If the virtual machine is unresponsive, execute the below command to kill or power down the virtual machine:

kill -9 vm_cartel_ID
Run this command to remap or to bring the PDL device online:

esxcli storage core adapter rescan -A vmhbax

For example:

esxcli storage core adapter rescan -A vmhba1
esxcli storage core adapter rescan --adapter vmhba1

Note: Alternatively, you can also use the VMware vSphere Client to bring the device online. To bring the devices online using the vSphere Client, navigate to Host > Configuration > Storage > Datastores and click Rescan All.

Additional Information

You can also use these steps to list the open files on the datastore you are trying to recover, so that you know the virtual machines or processes to kill:

Run this command to obtain the datastore UUID:

esxcfg-scsidevs -m

You see output similar to:

naa.60060160729025007628b54969f4e211:1 /vmfs/devices/disks/naa.60060160729025007628b54969f4e211:1 ########-####-##########7d 0 VMFS5-Datastore1

In this output, ########-####-##########7d is the UUID for datastore VMFS5-Datastore1.
Run this command against the datastore UUID:

lsof | grep -i datastore_UUID

You see output similar to:

1982587 vmx 12 35 /vmfs/volumes/########-####-##########7d/vMotion-test3/vMotion-test3-000001-delta.vmdk
1982587 vmx 12 36 /vmfs/volumes/########-####-##########7d/vMotion-test3/vMotion-test3-flat.vmdk

The process ID in this output is 1982587.

Note: Killing non-virtual machine processes may have adverse impact on the ESXi host state. If hostd or other ESXi host specifiec processes are listed as holding lock on the datastore, rebooting the ESXi host is a safer option.

Feedback

thumb_up Yes

thumb_down No