How to manually remove and recreate a vSAN disk group using esxcli
search cancel

How to manually remove and recreate a vSAN disk group using esxcli

book

Article ID: 315532

calendar_today

Updated On:

Products

VMware vSAN

Issue/Introduction

This article provides steps to manually remove and recreate a vSAN disk group using the ESXi Command Line Interface (esxcli). This is applicable when vCenter Server is inaccessible or an error in the vSphere Web Client prevents you from accessing the Disk Management view.

Environment

OSA
VMware vSAN 6.x
VMware vSAN 7.x
VMware vSAN 8.x 

Resolution

To remove and recreate a disk group using esxcli commands:
 
Note: These steps can be data-destructive if not followed carefully.
  1. Log in to the ESXi host that owns the disk group as the root user using SSH.
  2. Run one of these commands to put the host in Maintenance mode. There are 3 options:

    Note: VMware recommends using the ensureObjectAccessibility option. Failure to use this ensureObjectAccessibility mode or evacuateAllData mode may result in data loss.
     
    • Recommended:
      • Ensure accessibility of data:
        esxcli system maintenanceMode set --enable true -m ensureObjectAccessibility
      • Evacuate data:
        esxcli system maintenanceMode set --enable true -m evacuateAllData
         
    • Not recommended:
      • Unless recommended by VMware Support or in addressing a failed disk scenario. Ensure accessibility or full data migration cannot be used of a failed disk.
      • Don't evacuate data:
        esxcli system maintenanceMode set --enable true -m noAction
         
  3. Record the cache and capacity disk UUIDs in the existing group by running this command:
    esxcli vsan storage list

    Example output of a capacity tier device:
    naa.123456XXXXXXXXXXX:
    Device: naa.123456XXXXXXXXXXX
    Display Name: naa.123456XXXXXXXXXXX
    Is SSD: true
    VSAN UUID: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxx8fa3
    VSAN Disk Group UUID: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxd008e
    VSAN Disk Group Name: naa.50000XXXXX1245
    Used by this host: true
    In CMMDS: true
    On-disk format version: 5
    Deduplication: true
    Compression: true
    Checksum: 5356031598619392290
    Checksum OK: true
    Is Capacity Tier: true
    Encryption: false
    DiskKeyLoaded: false

    Note: For a cache disk:
    • the VSAN UUID and VSAN Disk Group UUID fields will match
    • Output will report: Is Capacity Tier: false
       
  4. Then remove the disk group
    esxcli vsan storage remove -u <VSAN Disk Group UUID>

    Note: Always double check the disk group UUID with the command:
    esxcli vsan storage list

    Note: If just removing a single absent capacity disk from an existing disk group with Dedup turned off use -d (or -u if absent) on the disk you want to remove:
    esxcli vsan storage remove -d <naa.xxxxxxx>
    esxcli vsan storage remove -u <UUID of the absent capacity disk to remove>

    If the command fails, try rebooting the host and trying again. 
     
  5. If you have replaced physical disks, see the Additional Information section.
     
  6. Create the disk group, using this command:
    esxcli vsan storage add -s naa.xxxxxx -d naa.xxxxxxx -d naa.xxxxxxxxxx -d naa.xxxxxxxxxxxx

    Note:
    If just adding a single capacity disk to an existing disk group with Dedup turned off use -s on the cache of the existing disk group then -d the disk you want to add:
    esxcli vsan storage add -s naa.xxxxxx -d naa.xxxxxxx 

    Where naa.xxxxxx is the NAA ID of the disk device and the disk devices are identified as per these options:
     
    • -s indicates a cache disk.
    • -d indicates a capacity disk.
       
  7. Run the esxcli vsan storage list command to see the new disk group and verify that all disks are reporting True in the "In CMMDS:" field output.



Additional Information

  • If you are replacing physical disks, additional steps are required:
1. VMware recommends the node is placed into Maintenance Mode as detailed in the Resolution section step 2, before triggering a power off or performing any host maintenance vSAN disks are hot swappable in the following circumstances:
 
a) Hybrid configuration and the controller supports hot swapping disks
 
b) All flash, Deduplication and Compression is disabled and the controller supports hot swapping disks

If you're unsure if the controller supports hot swapping of disks or Deduplication and Compression is enabled then treat it as it's not supported and put the node into Maintenance Mode with Ensure Accessibility, power off the node and replace the disk

Note: Disk groups with Deduplication and Compression enabled or replacing a failed cache tier disk requires the deletion of the disk group prior to replacing the failed disk. Follow the steps in the above Resolution section to replace the failed disk. If you can hot swap the failed disk be sure to run a rescan of the HBA so the new disk is detected and presented to ESXi for use
 
2. Log in to the node via SSH as root and run the below command to rescan all HBAs:
 
esxcli storage core adapter rescan --all
 
3. Verify that all disks are presented through the controller by running this command:
 
vdq -iq | less

Lists all the disks naa.xxxxx and tags for the SSD disk and capacity disk.
 
4. Tag the appropriate disk as a new capacity disk by running this command:
 
esxcli vsan storage tag add -d naa.xxxxxx -t capacityFlash
 
Note: This is only required for All Flash environments.
 
5. Tag the SSD disk as Cache disk by running this command.
 
esxcli vsan storage tag add -s t10.NVMe____INTEL_SSDPEDMD800G4_____
 
Note: Make a note of exact name from the vdq -iq command output
 
6  Remove the failed disk from the disk group esxcli vsan storage remove -d naa.xxxxxxx

7. To add the new capacity tier disk run esxcli vsan storage add -s naa.xxxxxx -d naa.xxxxxxx
 
Note: The -s switch is only required in the add command if multiple disk groups are present to distinguish which disk group to add the capacity tier disk to. You can also use more than one -d for multiple capacity tier disks to be added to the disk group.
  • This article is also applicable when the disk is not removed from the disk group before replacing the physical disk