Unable to delete Absent Disk with UUID via vSphere UI or command line.
search cancel

Unable to delete Absent Disk with UUID via vSphere UI or command line.

book

Article ID: 326475

calendar_today

Updated On:

Products

VMware vSAN

Issue/Introduction

Symptoms:

vSAN Disk Management indicates disk is marked as Absent vSAN Disk.

Absent Disk removal via vSphere UI or command line and fails with error "A general system error occurred: Sysinfo error on operation returned status : Not found. Please see the VMkernel log for detailed error"
 
Example :
Name,Drive Type,Disk Tier,Capacity,Virtual SAN Health Status,State,Transport Type,Adapter
Absent VSAN Disk (VSAN UUID:12345678-1234-1234-1234-19ooo265####),HDD,Capacity,0.00 B,--,Dead or Error,

 
The disk is replaced. But, failed disk entry we are still seeing on the diskgroup list as UUID when running vdq -IH command on ESXi host. 
 
# vdq -iH
Mappings:
   DiskMapping[0]:
           SSD:  naa.60000000000000000000000000000000
            MD:  naa.60000000000000000000000000000001
            MD:  naa.60000000000000000000000000000002
            MD:  naa.60000000000000000000000000000003
            MD:  naa.60000000000000000000000000000004
            MD:  naa.60000000000000000000000000000005
            MD:  naa.60000000000000000000000000000006
            MD:  12345678-1234-1234-1234-19ooo265#### ------> Failed disk entry 
 

cmmds-tool find -u <DISK UUID> -f json indicates the absent disk type as "DISK_INCOMING". The disk has already been replaced by the customer and can be confirmed by a stale entry seen in the CMMDS data using the following command.
 
#  cmmds-tool find -u 12345678-1234-1234-1234-19ooo265xxxx -f json
{
 "entries":
[
 {
   "uuid": "12345678-1234-1234-1234-19ooo265####",     >> Verify the DISK NAA ID for it's health
   "owner": "00000000-0000-0000-0000-000000000000",
   "health": "Unhealthy",
   "revision": "42",
   "type": "DISK_INCOMING",                            >> failed disk with this status means the stale entry                     
   "flag": "0",
   "minHostVersion": "0",
   "md5sum": "71d234da97c86bde01f267bf33a81306",
   "valueLen": "8",
   "content": "[[ ]]",
   "errorStr": "(null)"
 }
]
}

 
DISK_INCOMING means that the vSAN Data Components are being relocated to this disk.

Unable to remove the Absent vSAN Disk via vSphere Web Client and fails with below error:
 
"A general system error occurred: Sysinfo error on operation returned status : Not found. Please see the VMkernel log for detailed error information"
 
Unable to remove the Absent vSAN Disk via command line and fails with below error
 
# esxcli vsan storage remove -u 12345678-1234-1234-1234-19ooo265xxxx
Unable to remove device: Sysinfo error on operation returned status : Not found. Please see the VMkernel log for detailed error information

 
Vmkernel.log indicates the below error :
 
2019-04-23T02:53:57.703Z cpu18:36099 opID=70f0a7c5)WARNING: PLOG: PLOG_ExecVSIOp:1552: Disk 12345678-1234-1234-1234-19ooo265#### not found in plog list
2019-04-23T02:54:54.468Z cpu15:37184 opID=2519065f)WARNING: PLOG: PLOG_ExecVSIOp:1552: Disk 12345678-1234-1234-1234-19ooo265#### not found in plog list


Cause

We see this issue when the failed vSAN disk is replaced but somehow the stale disk entry is not flushed from the vSAN CMMDS data.

This typically happens when the customer replaces the failed disk prior to removing it from the vSAN disk group or remove the disk group if dedup is enabled or if it's a failed cache tier disk.

Resolution

The activity must be done in presence of the VMware GSS - vSAN TSE only.

  • As precaution make sure you have taken back up  of the data.
  • The host must be put in Maintenance Mode with Ensure Accessibility before  removing the Disk Group to ensure that there is no data loss and second Mirror Copy is good. 
  • Select the vSAN DG which has the Ghost Disk entry and do a full data migration of healthy disks in the DG, leaving the vSAN SSD and Ghost vSAN disk entries.

           Note : Ensure to have at least 30% free space in vSAN Datastore before performing full data migration.
            Example :  
           # vdq -iH
            Mappings:
            DiskMapping[0]:
            SSD:  naa.60000000000000000000000000000000
            MD:  52b3f7d8-9f71-17cb-XXXX-19a9f265abfb

  • Now delete disk group with no data migration then you should be able to delete and recreate the Disk Group/DG.
  • Alternatively, we can use the command below to delete the unknown/failed disk entry from command line.
    esxcli vsan storage remove -u <UUID of the absent capacity disk to remove>

 

Additional Information

More information on command line instructions to manage the disks.