Sometimes during disk failure event or other hardware events CLI is needed in order to remove/recreate/mount/dismount disk groups. This document will guide on how to interface with a vSAN disk group via CLI
####-##-##T05:55:20.854Z cpu16:2097804)ScsiDeviceIO: 4167: Cmd(0x45bfbd2cd688) 0x28, CmdSN 0xb1b9f4 from world 0 to dev "naa.################" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x3 0x11 0x1 Medium Error, LBA: 605786112
####-##-##T05:55:22.557Z cpu6:2097805)ScsiDeviceIO: 4167: Cmd(0x45bfbfb1f9c8) 0x28, CmdSN 0xb1ba7b from world 0 to dev "naa.################" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x3 0x11 0x1 Medium Error, LBA: 605786112
Before following the below action plan, we need check the ESXi host logs to check for the failure reported on the faulty drive. If vSAN has acknowledged the device as faulty or marked it as offline, we can proceed with the below steps.
If found not to be a hardware issue, we would need to check the driver and the firmware of the faulty drive.
In a case where Deduplication & Compression feature is enabled on the affected Disk Group:
In a case where Deduplication & Compression feature is not enabled:
esxcli commands:esxcli system maintenanceMode set --enable true -m ensureObjectAccessibilityesxcli system maintenanceMode set --enable true -m evacuateAllDataesxcli system maintenanceMode set --enable true -m noActionesxcli vsan storage listnaa.123456XXXXXXXXXXX:Device: naa.123456XXXXXXXXXXXDisplay Name: naa.123456XXXXXXXXXXXIs SSD: trueVSAN UUID: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxx8fa3VSAN Disk Group UUID: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxd008eVSAN Disk Group Name: naa.50000XXXXX1245Used by this host: trueIn CMMDS: trueOn-disk format version: 5Deduplication: trueCompression: trueChecksum: 5356031598619392290Checksum OK: trueIs Capacity Tier: trueEncryption: falseDiskKeyLoaded: false VSAN UUID and VSAN Disk Group UUID fields will matchIs Capacity Tier: falseesxcli vsan storage remove -u <VSAN Disk Group UUID>esxcli vsan storage list-d (or -u if absent) on the disk you want to remove:esxcli vsan storage remove -d <naa.xxxxxxx>esxcli vsan storage remove -u <UUID of the absent capacity disk to remove>esxcli vsan storage add -s naa.xxxxxx -d naa.xxxxxxx -d naa.xxxxxxxxxx -d naa.xxxxxxxxxxxxesxcli vsan storage add -s naa.xxxxxx -d naa.xxxxxxx esxcli vsan storage list command to see the new disk group and verify that all disks are reporting True in the "In CMMDS:" field output.esxcli storage core adapter rescan --allvdq -iq | lessesxcli vsan storage tag add -d naa.xxxxxx -t capacityFlashesxcli vsan storage tag add -s t10.NVMe____INTEL_SSDPEDMD800G4_____ vdq -iq command outputesxcli vsan storage remove -d naa.xxxxxxxesxcli vsan storage add -s naa.xxxxxx -d naa.xxxxxxxIf the disk group remove command fails and a reboot is not wanted for any reason here is a workaround:
1. Unplug the cache physically from the host.
2. vdq -iH to get disk group mappings and copy the cache uuid.
3. Run the following command.
esxcli vsan storage remove -u <cache disk uuid>
4. Plug the cache disk back in. Disk group might re-appear now due to metadata on the cache drive. Simply run the remove command against it again and this time it should be successful.