ESXi host non responsive due to RDM Reservation confits.

Products

VMware vSphere ESXi

Issue/Introduction

ESXi host has gone non responsive in vCenter and you find that there are "0x18" sense code reservation conflicts for RDMs.

Environment

ESXI 7.x

ESXI 8.x

Cause

The issue occurs when virtual machines participating in a clustering solution such as WSFC, Red Hat High Availability Cluster use shared RDMs and SCSI reservations across hosts, and a virtual machine on the other host is the active cluster node holding a SCSI Reservation. When this happens you can exhaust the hostd service, causing the ESXi host to go into a non responsive state from vCenter .

Resolution

Mark the LUNs as perennially reserved:

Determine which RDM LUNs are part of WSFC, Red Hat High Availability Cluster etc.
From the vSphere Client, select a virtual machine that has a mapping to the cluster RDM devices.
Edit the virtual machine settings and navigate to the Mapped RAW LUNs. In this example, Hard disk 2:
The ID of the device in use as an RDM (that is, the VML ID) is presented in the Physical LUN field.

Take note of the VML ID, which is a globally unique identifier for the shared device.
Identify the naa.id for this VML using this command:  esxcli storage core device list

For example:

esxcli storage core device list

naa.6589cfc000000a17ac02aae02067e747
   Display Name: FreeNAS iSCSI Disk (naa.6589cfc000000a17ac02aae02067e747)
   Has Settable Display Name: true
   Size: 40960
   Device Type: Direct-Access
   Multipath Plugin: NMP
   Devfs Path: /vmfs/devices/disks/naa.6589cfc000000a17ac02aae02067e747
   Vendor: FreeNAS
   Model: iSCSI Disk
   Revision: 0123
   SCSI Level: 6
   Is Pseudo: false
   Status: degraded
   Is RDM Capable: true
   Is Local: false
   Is Removable: false
   Is SSD: false
   Is VVOL PE: false
   Is Offline: false
   Is Perennially Reserved: false
   Queue Full Sample Size: 0
   Queue Full Threshold: 0
   Thin Provisioning Status: unknown
   Attached Filters:
   VAAI Status: supported
   Other UIDs: vml.0100010000303035303536################################
   Is Shared Clusterwide: true
   Is SAS: false
   Is USB: false
   Is Boot Device: false
   Device Max Queue Depth: 128
   No of outstanding IOs with competing worlds: 32
   Drive Type: unknown
   RAID Level: unknown
   Number of Physical Drives: unknown
   Protection Enabled: false
   PI Activated: false
   PI Type: 0
   PI Protection Mask: NO PROTECTION
   Supported Guard Types: NO GUARD SUPPORT
   DIX Enabled: false
   DIX Guard Type: NO GUARD SUPPORT
   Emulated DIX/DIF Enabled: false
Use the following esxcli command to mark the device as perennially reserved:

esxcli storage core device setconfig -d naa.id --perennially-reserved=true

For example:

esxcli storage core device setconfig -d naa.6589cfc000000a17ac02aae02067e747 --perennially-reserved=true

Note: For vSphere 7.x, see the Change Perennial Reservation Settings section of the vSphere Storage Guide.
To verify that the device is perennially reserved use the following command:

esxcli storage core device list -d naa.id

In the output of the esxcli command, search for the entry "Is Perennially Reserved: true". This shows that the device is marked as perennially reserved.

For example:

esxcli storage core device list -d naa.6589cfc000000a17ac02aae02067e747

naa.6589cfc000000a17ac02aae02067e747
   Display Name: FreeNAS iSCSI Disk (naa.6589cfc000000a17ac02aae02067e747)
   Has Settable Display Name: true
   Size: 40960
   Device Type: Direct-Access
   Multipath Plugin: NMP
   Devfs Path: /vmfs/devices/disks/naa.6589cfc000000a17ac02aae02067e747
   Vendor: FreeNAS
   Model: iSCSI Disk
   Revision: 0123
   SCSI Level: 6
   Is Pseudo: false
   Status: degraded
   Is RDM Capable: true
   Is Local: false
   Is Removable: false
   Is SSD: false
   Is VVOL PE: false
   Is Offline: false
   Is Perennially Reserved: true
   Queue Full Sample Size: 0
   Queue Full Threshold: 0
   Thin Provisioning Status: unknown
   Attached Filters:
   VAAI Status: supported
   Other UIDs: vml.0100010000303035303536################################
   Is Shared Clusterwide: true
   Is SAS: false
   Is USB: false
   Is Boot Device: false
   Device Max Queue Depth: 128
   No of outstanding IOs with competing worlds: 32
   Drive Type: unknown
   RAID Level: unknown
   Number of Physical Drives: unknown
   Protection Enabled: false
   PI Activated: false
   PI Type: 0
   PI Protection Mask: NO PROTECTION
   Supported Guard Types: NO GUARD SUPPORT
   DIX Enabled: false
   DIX Guard Type: NO GUARD SUPPORT
   Emulated DIX/DIF Enabled: false
Repeat the procedure for each Mapped RAW LUN that is participating in the clustering solution such as WSFC, Red Hat High Availability Cluster, etc.

Note: The configuration is permanently stored with the ESXi host and persists across restarts. To remove the perennially reserved flag, run this command:

esxcli storage core device setconfig -d naa.id --perennially-reserved=false
If the host does not come back up after setting perennially-reserved=false, reboot the host.

Additional Information

RDMs that are not perennially reserved can also cause long boot times.

ESXi host takes a long time to start during rescan of RDM LUNs