vSAN on disk upgrade fails at 10%
search cancel

vSAN on disk upgrade fails at 10%

book

Article ID: 315510

calendar_today

Updated On:

Products

VMware vSAN

Issue/Introduction

To troubleshoot the General Virtual SAN error status when upgrading VMware vSAN, identify the affected objects and perform the corrective action suggested in this article.


Symptoms:
  • When upgrading vSAN, the On Disk Format Conversion task fails at 10%.
  • In the vSphere Web Client, you see an error similar to:

    A general system error occurred: Failed to realign following Virtual SAN objects: <uuid list>, due to being locked or lack of vmdk descriptor file which requires manual fix
     
  • The Convert disk format for vSAN task fails with a General Virtual SAN error status.

Note: For additional symptoms and log entries, see the Additional Information section.

Environment

VMware vSAN 6.2.x

Cause

When upgrading the on-disk format, during the 10% - 15% phase, vSAN realigns objects to prepare them for new features. The process is performed in two steps:
  • In the first step, vSAN realigns objects and their components to have a 1 MB address space. The process fails in this step if the cluster is unstable or if there is not enough disk space.
  • In the second step, vSAN realigns vsanSparse objects to be 4k aligned. The process fails if there are objects that cannot be upgraded to version 2.5.

    An object will fail to upgrade under these conditions:
     
    • The object is left behind and is no longer referenced by anything.
    • The disk chain is not complete or is corrupted.

      Note: For an example scenario when the objects fail, see the Additional Information section.

Resolution

Caution: Removing disks and objects is risky because it is possible that the objects are still in use. Always double check before you delete any object. If you are unsure about any of the steps detailed in this article, contact VMware support.
 
To resolve this issue, identify the orphaned objects and take appropriate action to satisfy the upgrade conditions.
 
  1. Ensure the initial realignment is complete:
     
    • To rule out failures during the first step of the on-disk upgrade check, review the vmware-vsan-health-service.log file.

      Note: The vmware-vsan-health-service.log file is located in these directories:
       
      • Windows vCenter Server: %Programdata%\VMware\vCenterServer\logs\vsan-health\
      • vCenter Server Appliance: /storage/log/vmware/vsan-health/

        During object realignment, you see entries similar to:

        2016-01-26T21:54:43.650Z INFO vsan-health[Thread-19] [VsanRealignClusterLib::QueryUnalignedStatus] Found 6 objects which aren't MB aligned
        2016-01-26T21:54:43.650Z INFO vsan-health[Thread-19] [VsanRealignClusterLib::CheckForUnalignedObjects] Fixing MB alignment on 2deaa756-0d63-0f53-690e-020003c607e5
        .
        .
        .
    • The first step of object realignment is complete when you see this entry:

      [VsanRealignClusterLib::CheckForUnalignedObjects] All Objects now MB aligned

      This means there are no known issues at this point if the cluster is stable and there is space available.
      If you do not see this line in the vmware-health-service.log file, review the errors returned after the alignment output.
       
  2. Review the output of the on-disk upgrade failure Error stack and make note of the affected UUIDs in a text file. You will use this list of UUIDs to cross reference with the script output and confirm the resolution of the issue.

    The Error stack reports the error:

    Failed to realign following virtual SAN objects:<UUID> due to being locked or lack of vmdk descriptor file, which requires manual fix.

    Copy the UUID(s) following the Failed to realign following Virtual SAN objects: string and save them to a text file.

     
  3. Download and save the attached zip file 2144881_VsanRealign.zip to the Windows machine you use to access the vSphere Client.
     
  4. Unzip 2144881_VsanRealign.zip and then use the Datastore browser to copy the VsanRealign.pyscript to a shared datastore on your ESXi host.
     
  5. VMware recommends you copy the script to a datastore initially and then copy it using command line to /var/tmp.

    For example:

    cp /vmfs/volumes/VMFS1/VsanRealign.py /var/tmp/VsanRealign.py
     
  6. On the ESXi shell session, change the directory to where you have saved the VsanRealign.py script.

    For example:

    cd /var/tmp/
  7. Run this command to start the script:

    python VsanRealign.py precheck

    Note: The namespace scan can take a long time.

    The script returns a list of vSAN objects with a problem and the recommended actions to allow the disk format upgrade to complete.

    You see output similar to:

    Finished scanning, compiling results
    -------------------------------------------------------------------
    The following objects were missing descriptor files.
    The recorded path doesn't exist and no other reference to the object was found.
    -------------------------------------------------------------------
    This will create descriptor files automatically for all disks under
    lostAndFound in the VSAN datastore.

    Other objects that aren't disks missing a descriptor will be removed permanently.

    NOTE: Recovered disks will not have any snapshot chain information in them.
    Any snapshots deltas will not be correctly recovered.

    39aef356-####-####-####-########26c
    Object UUID: 4698f356-####-####-####-########26c
    Recorded Path: /vmfs/volumes/vsan:523e23728ba31d24-84ab9fc2821d6bdf/c695f356-####-####-####-########26c/linux-vm08-25632575.vswp
    Recorded VM: linux-vm08
    Object Class: vmswap
    Object Size: 536870912
    Parent directory exists
    AutoFix: Will remove object

    Object UUID: 8efdf456-####-####-####-########63b
    Recorded Path: /vmfs/volumes/vsan:5228981f7117d6eb-94106da8ad4a6377/88fdf456-####-####-####-########63b/linux-vm-a.vmdk
    Recorded VM: None
    Object Class: vdisk
    Object Size: 2147483648
    Parent directory exists
    AutoFix: Will create new descriptor

    Descriptors will be created in /vmfs/volumes/vsan:5228981f7117d6eb-94106da8ad4a6377/lostAndFound

    NOTE: Recovered disks will not have any snapshot chain information in them.
    Any snapshots deltas will not be correctly recovered.

    Create 'linux-vm-a-8efdf456-####-####-####-########63b.vmdk' for 8efdf456-c408-a163-2f8e-02001645d63b
    Remove 4698f356-####-####-####-########26c type: vmswap vm name: linux-vm08



    When prompted for a decision to proceed with the AutoFix suggestions, enter yes to apply the Autofix actions. In this case, the descriptor files for vdisk objects are recreated. The vmswap objects and all other objects that are not virtual disks missing a descriptor are permanently removed, as there is no useful data.

    Note: If you enter no, the AutoFix actions are not applied, and you will need to take manual action.
     
  8. Review the report provided by the script for objects that are in use by Change Block Tracking that may cause an issue with the on-disk upgrade.

    You see a report similar to:

    The following objects are in use by Change Block Tracking and may encounter issues during upgrade.
    Rerun this script with the 'fixcbt' option if upgrade fails.


    If action is required, resolve the CBT issues before proceeding to step 9. If no action is required, proceed with step 9.
     
  9. Per step 7, examine the lostAndFound directory on the vSAN Datastore. Examine the orphaned disks by attaching them to a non-production virtual machine to check file integrity and determine if this virtual disk is still applicable to your environment.

    Note: Realignment of these objects is still required. Run the on-disk upgrade process again to proceed.

Additional Information

Additional symptoms and log entries
  • In the error stack, you see entries similar to:

    Failed to realign following Virtual SAN objects: 1e58f256-####-####-####-########4e0, f44ff256-####-####-####-########a28, 8959f256-####-####-####-########4e0, db50f256-####-####-####-########4e0, 2358f256-####-####-####-########4e0, 1c58f256-####-####-####-########a28, 7559f256-####-####-####-########4e0, e850f256-####-####-####-########c91, 5858f256-####-####-####-########4e0, 0a51f256-####-####-####-########a28, 1e58f256-####-####-####-########a28, dd50f256-####-####-####-########03b, ec50f256-####-####-####-########a28, 534ff256-####-####-####-########a28, de57f256-####-####-####-########c91, 1658f256-####-####-####-########03b, due to being locked or lack of vmdk descriptor file, which requires manual fix.
     
  • In the vmware-vsan-health-service.log file, you see entries similar to:

    Note: The vmware-vsan-health-service.log file is located in these directories:
     
    • vCenter Server on Windows: %Programdata%\VMware\vCenterServer\logs\vsan-health\vmware-vsan-health-service.log
    • vCenter Server Appliance: /storage/log/vmware/vsan-health/vmware-vsan-health-service.log

2016-03-24T11:03:00.148Z DEBUG vsan-health[Thread-1223] [VsanRealignClusterLib::RealignClusterV3] Finished namespaces
2016-03-24T11:03:00.620Z INFO vsan-health[Thread-1223] [VsanRealignClusterLib::RealignClusterV3] After namespace realign 22 objects need realign, previously 22
2016-03-24T11:03:00.621Z INFO vsan-health[Thread-1223] [VsanRealignClusterLib::RealignClusterV3] Made no progress. 22 objects still need realigning
2016-03-24T11:03:00.621Z INFO vsan-health[Thread-1223] [VsanRealignClusterLib::RealignClusterV3] (str) [
'6e94f356-####-####-####-########d31',
'fa90f356-####-####-####-########d31',
....
'e78ff356-####-####-####-########26c',
'818df356-####-####-####-########26c'
]
2016-03-24T11:03:00.623Z ERROR vsan-health[Thread-1223] [VsanVcDiskFormatConverterImpl::_Run] Failed to migrate vsanSparse objects.
2016-03-24T11:03:00.623Z ERROR vsan-health[Thread-1223] [VsanVcDiskFormatConverterImpl::_Run] Made no progress
Traceback (most recent call last):
File "/usr/lib/vmware-vpx/vsan-health/pyMoVsan/VsanVcDiskFormatConverterImpl.py", line 1633, in _Run
self._HandleUserCancellation)
File "/usr/lib/vmware-vpx/vsan-health/pyMoVsan/VsanRealignClusterLib.py", line 335, in RealignClusterV3
uuidRemaining=objectsNeedingRealign)
RealignFailed: Made no progress

Note: The preceding log excerpts are only examples. Date, time, and environmental variables may vary depending on your environment.

Monitor the upgrade status.
 

Using ESXi Shell in ESXi 5.x and 6.x
How to file a Support Request in the Broadcom Portal



Attachments

2144881_VsanRealign.zip get_app
VsanRealign_65_plus.py get_app