To troubleshoot the General Virtual SAN errorstatus when upgrading VMware vSAN, identify the affected objects and perform the corrective action suggested in this article.
Symptoms:
When upgrading vSAN, the On Disk Format Conversion task fails at 10%.
In the vSphere Web Client, you see an error similar to:
A general system error occurred: Failed to realign following Virtual SAN objects: <uuid list>, due to being locked or lack of vmdk descriptor file which requires manual fix
The Convert disk format for vSAN task fails with a General Virtual SAN error status.
Note: For additional symptoms and log entries, see the Additional Information section.
Environment
VMware vSAN 6.2.x
Cause
When upgrading the on-disk format, during the 10% - 15% phase, vSAN realigns objects to prepare them for new features. The process is performed in two steps:
In the first step, vSAN realigns objects and their components to have a 1 MB address space. The process fails in this step if the cluster is unstable or if there is not enough disk space.
In the second step, vSAN realigns vsanSparse objects to be 4k aligned. The process fails if there are objects that cannot be upgraded to version 2.5.
An object will fail to upgrade under these conditions:
The object is left behind and is no longer referenced by anything.
The disk chain is not complete or is corrupted.
Note: For an example scenario when the objects fail, see the Additional Information section.
Resolution
Caution: Removing disks and objects is risky because it is possible that the objects are still in use. Always double check before you delete any object. If you are unsure about any of the steps detailed in this article, contact VMware support.
To resolve this issue, identify the orphaned objects and take appropriate action to satisfy the upgrade conditions.
Ensure the initial realignment is complete:
To rule out failures during the first step of the on-disk upgrade check, review the vmware-vsan-health-service.log file.
Note: The vmware-vsan-health-service.log file is located in these directories:
Windows vCenter Server: %Programdata%\VMware\vCenterServer\logs\vsan-health\
vCenter Server Appliance: /storage/log/vmware/vsan-health/
During object realignment, you see entries similar to:
2016-01-26T21:54:43.650Z INFO vsan-health[Thread-19] [VsanRealignClusterLib::QueryUnalignedStatus] Found 6 objects which aren't MB aligned 2016-01-26T21:54:43.650Z INFO vsan-health[Thread-19] [VsanRealignClusterLib::CheckForUnalignedObjects] Fixing MB alignment on 2deaa756-0d63-0f53-690e-020003c607e5 . . .
The first step of object realignment is complete when you see this entry:
[VsanRealignClusterLib::CheckForUnalignedObjects] All Objects now MB aligned
This means there are no known issues at this point if the cluster is stable and there is space available. If you do not see this line in the vmware-health-service.log file, review the errors returned after the alignment output.
Review the output of the on-disk upgrade failure Error stack and make note of the affected UUIDs in a text file. You will use this list of UUIDs to cross reference with the script output and confirm the resolution of the issue.
The Error stack reports the error:
Failed to realign following virtual SAN objects:<UUID> due to being locked or lack of vmdk descriptor file, which requires manual fix.
Copy the UUID(s) following the Failed to realign following Virtual SAN objects: string and save them to a text file.
Download and save the attached zip file 2144881_VsanRealign.zip to the Windows machine you use to access the vSphere Client.
Unzip 2144881_VsanRealign.zip and then use the Datastore browser to copy the VsanRealign.pyscript to a shared datastore on your ESXi host.
VMware recommends you copy the script to a datastore initially and then copy it using command line to /var/tmp.
On the ESXi shell session, change the directory to where you have saved the VsanRealign.py script.
For example:
cd /var/tmp/
Run this command to start the script:
python VsanRealign.py precheck
Note: The namespace scan can take a long time.
The script returns a list of vSAN objects with a problem and the recommended actions to allow the disk format upgrade to complete.
You see output similar to:
Finished scanning, compiling results ------------------------------------------------------------------- The following objects were missing descriptor files. The recorded path doesn't exist and no other reference to the object was found. ------------------------------------------------------------------- This will create descriptor files automatically for all disks under lostAndFound in the VSAN datastore.
Other objects that aren't disks missing a descriptor will be removed permanently.
NOTE: Recovered disks will not have any snapshot chain information in them. Any snapshots deltas will not be correctly recovered.
39aef356-####-####-####-########26c Object UUID: 4698f356-####-####-####-########26c Recorded Path: /vmfs/volumes/vsan:523e23728ba31d24-84ab9fc2821d6bdf/c695f356-####-####-####-########26c/linux-vm08-25632575.vswp Recorded VM: linux-vm08 Object Class: vmswap Object Size: 536870912 Parent directory exists AutoFix: Will remove object
Object UUID: 8efdf456-####-####-####-########63b Recorded Path: /vmfs/volumes/vsan:5228981f7117d6eb-94106da8ad4a6377/88fdf456-####-####-####-########63b/linux-vm-a.vmdk Recorded VM: None Object Class: vdisk Object Size: 2147483648 Parent directory exists AutoFix: Will create new descriptor
Descriptors will be created in /vmfs/volumes/vsan:5228981f7117d6eb-94106da8ad4a6377/lostAndFound
NOTE: Recovered disks will not have any snapshot chain information in them. Any snapshots deltas will not be correctly recovered.
Create 'linux-vm-a-8efdf456-####-####-####-########63b.vmdk' for 8efdf456-c408-a163-2f8e-02001645d63b Remove 4698f356-####-####-####-########26c type: vmswap vm name: linux-vm08
When prompted for a decision to proceed with the AutoFix suggestions, enter yes to apply the Autofix actions. In this case, the descriptor files for vdisk objects are recreated. The vmswap objects and all other objects that are not virtual disks missing a descriptor are permanently removed, as there is no useful data.
Note: If you enter no, the AutoFix actions are not applied, and you will need to take manual action.
Review the report provided by the script for objects that are in use by Change Block Tracking that may cause an issue with the on-disk upgrade.
You see a report similar to:
The following objects are in use by Change Block Tracking and may encounter issues during upgrade. Rerun this script with the 'fixcbt' option if upgrade fails.
If action is required, resolve the CBT issues before proceeding to step 9. If no action is required, proceed with step 9.
Per step 7, examine the lostAndFound directory on the vSAN Datastore. Examine the orphaned disks by attaching them to a non-production virtual machine to check file integrity and determine if this virtual disk is still applicable to your environment.
Note: Realignment of these objects is still required. Run the on-disk upgrade process again to proceed.
Additional Information
Additional symptoms and log entries
In the error stack, you see entries similar to:
Failed to realign following Virtual SAN objects: 1e58f256-####-####-####-########4e0, f44ff256-####-####-####-########a28, 8959f256-####-####-####-########4e0, db50f256-####-####-####-########4e0, 2358f256-####-####-####-########4e0, 1c58f256-####-####-####-########a28, 7559f256-####-####-####-########4e0, e850f256-####-####-####-########c91, 5858f256-####-####-####-########4e0, 0a51f256-####-####-####-########a28, 1e58f256-####-####-####-########a28, dd50f256-####-####-####-########03b, ec50f256-####-####-####-########a28, 534ff256-####-####-####-########a28, de57f256-####-####-####-########c91, 1658f256-####-####-####-########03b, due to being locked or lack of vmdk descriptor file, which requires manual fix.
In the vmware-vsan-health-service.log file, you see entries similar to:
Note: The vmware-vsan-health-service.log file is located in these directories:
vCenter Server on Windows: %Programdata%\VMware\vCenterServer\logs\vsan-health\vmware-vsan-health-service.log
vCenter Server Appliance: /storage/log/vmware/vsan-health/vmware-vsan-health-service.log
2016-03-24T11:03:00.148Z DEBUG vsan-health[Thread-1223] [VsanRealignClusterLib::RealignClusterV3] Finished namespaces 2016-03-24T11:03:00.620Z INFO vsan-health[Thread-1223] [VsanRealignClusterLib::RealignClusterV3] After namespace realign 22 objects need realign, previously 22 2016-03-24T11:03:00.621Z INFO vsan-health[Thread-1223] [VsanRealignClusterLib::RealignClusterV3] Made no progress. 22 objects still need realigning 2016-03-24T11:03:00.621Z INFO vsan-health[Thread-1223] [VsanRealignClusterLib::RealignClusterV3] (str) [ '6e94f356-####-####-####-########d31', 'fa90f356-####-####-####-########d31', .... 'e78ff356-####-####-####-########26c', '818df356-####-####-####-########26c' ] 2016-03-24T11:03:00.623Z ERROR vsan-health[Thread-1223] [VsanVcDiskFormatConverterImpl::_Run] Failed to migrate vsanSparse objects. 2016-03-24T11:03:00.623Z ERROR vsan-health[Thread-1223] [VsanVcDiskFormatConverterImpl::_Run] Made no progress Traceback (most recent call last): File "/usr/lib/vmware-vpx/vsan-health/pyMoVsan/VsanVcDiskFormatConverterImpl.py", line 1633, in _Run self._HandleUserCancellation) File "/usr/lib/vmware-vpx/vsan-health/pyMoVsan/VsanRealignClusterLib.py", line 335, in RealignClusterV3 uuidRemaining=objectsNeedingRealign) RealignFailed: Made no progress
Note: The preceding log excerpts are only examples. Date, time, and environmental variables may vary depending on your environment.