Troubleshooting VxRail Manager Upgrade fails at VXRAIL_UPGRADE_PRECHECK due to duplicate upgrades triggered
search cancel

Troubleshooting VxRail Manager Upgrade fails at VXRAIL_UPGRADE_PRECHECK due to duplicate upgrades triggered

book

Article ID: 313499

calendar_today

Updated On:

Products

VMware Cloud Foundation VMware Cloud Foundation 5.x

Issue/Introduction

  • VxRail Manager Upgrade fails with NullPointerException while trying to poll for the upgrade status.
  • The LCM debug logs (located in /var/log/vmware/vcf/lcm/lcm-debug.log) will mention the following before setting the upgrade status as COMPLETED_WITH_FAILURE 
    Error occurred while upgrading VxRail component
    java.lang.NullPointerException: null
            at java.base/java.util.Comparator.lambda$comparingLong$6043328a$1(Comparator.java:511)
            at java.base/java.util.Collections$ReverseComparator2.compare(Collections.java:5278)
            at java.base/java.util.TimSort.countRunAndMakeAscending(TimSort.java:355)
            at java.base/java.util.TimSort.sort(TimSort.java:220)
            at java.base/java.util.Arrays.sort(Arrays.java:1515)
            at java.base/java.util.ArrayList.sort(ArrayList.java:1750)
            at java.base/java.util.stream.SortedOps$RefSortingSink.end(SortedOps.java:392)
            at java.base/java.util.stream.Sink$ChainedReference.end(Sink.java:258)
            at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:485)
            at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)
            at java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:913)
            at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
            at java.base/java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:578)
            at com.vmware.evo.sddc.lcm.primitive.impl.vxm.VxmPrimitiveImpl.pollUpgradeStatus(VxmPrimitiveImpl.java:265)
            at com.vmware.evo.sddc.lcm.primitive.impl.vxm.VxmPrimitiveImpl.postUpgrade(VxmPrimitiveImpl.java:214)
            at com.vmware.evo.sddc.lcm.orch.PrimitiveServiceImpl.postUpgradeAsync(PrimitiveServiceImpl.java:331)

Environment

VMware Cloud Foundation 5.X

Cause

  • SDDC UI lag may cause users to unintentionally trigger the same VxRail Manager upgrade multiple times. This registers duplicate upgrades for the same resource leading to issues due to race conditions.
  • This issue occurs when the start time of one of the duplicate triggered upgrades is not set while registering the upgrade.

Resolution

Issue is resolved in VCF 5.1

To workaround the issue, please follow the steps mentioned below to delete the upgrade database entry missing its start time:

  1. Take snapshot of SDDC Manager VM

  2. SSH to SDDC Manager with vcf user and su to root.

  3. Copy or Download the script "cleanup_upgrades_invalid_start_time.py" attached to this KB to /home/vcf/ location.

  4. Fetch the Bundle ID required to execute the script from the SDDC UI by clicking "View Details" of the failed VxRail Manager upgrade under "Available Updates" of the domain.

  5. Run the script while providing the bundle ID fetched from step 4 as an argument as shown below

    ./cleanup_upgrades_invalid_start_time.py <Bundle ID fetched from Step 4>


    Sample output

    --------------------------------------------------------------------------
    LOG FILE : /var/log/vmware/vcf/lcm/cleanup_upgrades_invalid_start_time.log
    --------------------------------------------------------------------------
    YYYY-MM-DDTHH:MM,607 [INFO] root: Performing cleanup for upgrade entry with invalid Start Time for bundle with IDs : ['<Bundle ID fetched from Step 4>']
    YYYY-MM-DDTHH:MM,687 [INFO] root: Upgrade Element entry cleanup complete.

     

  6. Log into the SDDC Manager UI and re-trigger the VxRail Manager upgrade.


 

Attachments

cleanup_upgrades_invalid_start_time.py get_app