To identify the specific device that caused the failure:
4. You can also use the command vdq -iH to list the disk mappings on the host to find the failed disk. If the disk is listed as a UUID and not the disk identifier then vSAN has failed out the disk as seen below:
[root@esx01:~] vdq -iH
Mappings:
DiskMapping[0]:
SSD: naa.58ce########fec5
MD: naa.58ce########a7f9
MD: naa.58ce#######bbd1
MD: naa.58ce#######02a5
MD: naa.58ce########9d69
MD: naa.58ce########aaf5
MD: naa.58ce########a7e5
MD: ########-########-####-####-####-########226a
5. To identify the display name of the disk and if the failure is recent enough run the following command:
grep ########-########-####-####-####-########226a /var/log/vmkernel.log
you should see similar output as below:
2021-01-09T05:45:41.638Z cpu0:7053521)LSOM: LSOMLogDiskEvent:7509: Disk Event permanent error propagated for MD ########-########-####-####-####-########226a (naa.58cexxxxxxxxaad9:2)
If necessary, we can get the path information about the failed device to further assist with identification.
From the ESXi Shell, run this command:
# esxcfg-mpath -bd <naa identifier device>
For the example in the Resolution section, the command and example output is:
# esxcfg-mpath -bd naa.58ce########aad9
naa.58cexxxxxxxxaad9 : VMware Serial Attached SCSI Disk (naa.58cexxxxxxxxaad9)
vmhba1:C0:T1:L0 LUN:0 state:active sas Adapter: 5005########8c11 Target: 5000########02af
The device is target #1 on vmhba1.
We can also get the physical location of the device.
From the ESXi Shell, run these commands:
# esxcli storage core device physical get -d <naa identifier device>
# esxcli storage core device raid list -d <naa identifier device>
The command and example output is:
# esxcli storage core device physical get -d naa.58ce########aad9
Physical Location: enclosure 2, slot 5
Or
# esxcli storage core device raid list -d naa.58ce########aad9
Physical Location: enclosure 2, slot 5
Note: The above commands may not work with certain drivers as the vSAN disk serviceability plugin is not coded for all drivers. The current supported list is below:
hpsa
nhpsa
iavmd
nvme_pcie
lsi_mr3
lsi_msgpt3
lsi_msgpt35
smartpqi
If your driver is not listed then work with your hardware vendor to open an engineering to engineering case so we can work together to update the plugin code to interface with those drivers.
You may see one of the below errors if your driver is not listed above when running these commands which reflects we can't interact with the device to either pull the required information or turn on/off the LED:
esxcli storage core device physical get -d naa.6589cfc########93491
Unable to get location for device naa.6589cfc########93491: No LSU plugin can manage this device.
esxcli storage core device raid list -d naa.5000########4c2f
Unable to get location for device naa.5000########4c2f: Can not manage device!