Host PSODs on Boot after removing a SSD Drive from the host
search cancel

Host PSODs on Boot after removing a SSD Drive from the host

book

Article ID: 315545

calendar_today

Updated On:

Products

VMware vSAN

Issue/Introduction

Provide steps to correct the host's disk group configuration so that the host will be able to boot and re-join the vSAN cluster.

Symptoms:
  • ESXi host PSOD after removing a failed SSD cache or capacity drive in a vSAN disk group.


Environment

VMware vSAN 6.1.x
VMware vSAN 6.7.x
VMware vSAN 6.x
VMware vSAN 6.5.x
VMware vSAN 6.0.x

Cause

Drive physically removed/replaced without first removing it logically from the vSAN disk group.

Resolution

To resolve the issue:

1. Disable vSAN at host boot.
Reboot the ESXi host. 
During the pre-boot splash screen, press SHIFT+O to modify the boot options. 
In the resulting screen, move to the end of the boot line. 
To disable the vSAN kernel modules, add a space at the end of the boot line and then add the following line:

jumpstart.disable=vsan,lsom,plog,virsto,cmmds,, 

Press the enter key to resume booting.

2. Boot host up

3. Remove Partitions from Disks in the failed disk group.
 
In vCenter Hosts and Clusters > Select the problem Host > Configure > Storage > Storage Devices

Using All Actions > Erase Partitions of each of the disks of the disk groups.

Be absolutely certain which disks to erase, as with multiple disk groups, there is potential for mistakenly erasing the wrong disk.

4. Reboot Host - vSAN service will be restored on reboot. Keep the host in maintenance mode.

5. Disks will now be eligible to join vSAN as part of a new Disk group.

6. Ensure the Logical vSAN Disk group is removed.
Hosts and Clusters > Select vSAN Cluster Inventory Object > Configure > vSAN > Disk Management.
Locate the host and disk group identified as failed. Remove the disk group with the No Data Migration option.

7. If the new replacement disk has already been slotted into the host, proceed to creating the disk group. If the disk has not been replaced, proceed to installing the new disk.

8. Recreate the Disk Group and take the host out of maintenance mode.

9. Check that the vSAN datastore capacity has increased to reflect the re-created disk group.

Additional Information

Impact/Risks:
Potential Data Unavailability or Data Loss. When a drive is removed improperly, the disk groups data itself, in an all flash dedupe configuration, is lost.