Connect to a datastore after losing connectivity from the ESXi hosts
search cancel

Connect to a datastore after losing connectivity from the ESXi hosts

book

Article ID: 411220

calendar_today

Updated On:

Products

VMware vCenter Server

Issue/Introduction

SCSI command failures seen in the vmkernel.log can be aborts, timeouts, state in doubt, reservation failures.
The datastore is visible in the VC and ESXi host web client but cannot be browsed or accessed.
The VMs in the inventory may be inaccessible and may not be operational even if pings succeed.

Typical troubleshooting steps to regain connectivity to the datastore. In some cases these steps can help restore datastore access but sometimes will fail to do so. 

  • Reboot each of the ESXi hosts in the cluster with visibility to the datastore.
  • Unmount the datastore and remove/recreate initiator groups on the backend storage array before remounting on the hosts.
  • Restart the FC fabric switches.
  • Failover the storage array controller to the standby controller or reboot the controllers.
  • Issuing a lunreset to the device fails.  
  • Check if the device is seen as a snapshot LUN as per KB: Troubleshooting LUNs detected as snapshot LUNs

A large number of LUN reservation conflicts (D:0x18) are seen in the vmkernel.log for the hosts.  

Environment

VMware vSphere ESXi (All Versions)

Cause

There is a stale lock or other issue within the storage array with the datastore's backing LUN.

Resolution

Steps to attempt reconnection with the unavailable datastore:

  • Unmount the datastore from the ESXi hosts in the cluster with visibility to the datastore.
  • Detach the device from each of the hosts that are part of the initiator group as described in Detach a LUN device from ESXi hosts
  • Remove the initiator groups from the backend storage array.
  • Perform a rescan on the hosts to verify that the datastore is not visible and the device is detached.
  • Once the backing device has been successfully detached, then the initiator groups can be recreated.

The key is to verify none of the hosts have the backing device attached.  If it is, then removing and recreating the initiator groups may not release the stale locks.  After creating the initiator groups the datastore can be created on the hosts and normal access to the data should be restored.  Reach out to the storage vendor if needed to run through the initiator group work if needed.