Troubleshooting VxRail Manager Upgrade fails at VXRAIL_UPGRADE_PRECHECK due to duplicate upgrades triggered
search cancel

Troubleshooting VxRail Manager Upgrade fails at VXRAIL_UPGRADE_PRECHECK due to duplicate upgrades triggered

book

Article ID: 313499

calendar_today

Updated On:

Products

VMware Cloud Foundation

Issue/Introduction

This article exists to guide customers and GSS through the workaround steps and script required to get past the VxRail Manager upgrade failure caused by NullPointerException due to duplicate triggered upgrades.


Symptoms:

VxRail Manager Upgrade fails with NullPointerException while trying to poll for the upgrade status. The LCM debug logs (located in /var/log/vmware/vcf/lcm/lcm-debug.log) will mention the following before setting the upgrade status as COMPLETED_WITH_FAILURE :

Error occurred while upgrading VxRail component
java.lang.NullPointerException: null
        at java.base/java.util.Comparator.lambda$comparingLong$6043328a$1(Comparator.java:511)
        at java.base/java.util.Collections$ReverseComparator2.compare(Collections.java:5278)
        at java.base/java.util.TimSort.countRunAndMakeAscending(TimSort.java:355)
        at java.base/java.util.TimSort.sort(TimSort.java:220)
        at java.base/java.util.Arrays.sort(Arrays.java:1515)
        at java.base/java.util.ArrayList.sort(ArrayList.java:1750)
        at java.base/java.util.stream.SortedOps$RefSortingSink.end(SortedOps.java:392)
        at java.base/java.util.stream.Sink$ChainedReference.end(Sink.java:258)
        at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:485)
        at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)
        at java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:913)
        at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
        at java.base/java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:578)
        at com.vmware.evo.sddc.lcm.primitive.impl.vxm.VxmPrimitiveImpl.pollUpgradeStatus(VxmPrimitiveImpl.java:265)
        at com.vmware.evo.sddc.lcm.primitive.impl.vxm.VxmPrimitiveImpl.postUpgrade(VxmPrimitiveImpl.java:214)
        at com.vmware.evo.sddc.lcm.orch.PrimitiveServiceImpl.postUpgradeAsync(PrimitiveServiceImpl.java:331)


Environment

Vmware Cloud Foundation 5.1
Vmware Cloud Foundation 5.0.0.1
VMware Cloud Foundation 5.0

Cause

SDDC UI lag may cause users to unintentionally trigger the same VxRail Manager upgrade multiple times. This registers duplicate upgrades for the same resource leading to issues due to race conditions. This issue occurs when the start time of one of the duplicate triggered upgrades is not set while registering the upgrade.

Resolution

Currently there is no resolution to the issue.

Workaround:

To workaround the issue, please follow the steps mentioned below to delete the upgrade database entry missing its start time:

  1. SSH into SDDC Manager.
    ssh vcf@<SDDC_MANAGER_IP>

  2. Switch to root user - su

  3. Copy or Download the script "cleanup_upgrades_invalid_start_time.py" attached to this KB to /home/vcf/ location.

  4. Fetch the Bundle ID required to execute the script from the SDDC UI by clicking "View Details" of the failed VxRail Manager upgrade under "Available Updates" of the domain.

  5. Run the script while providing the bundle ID fetched from step 4 as an argument as shown below -

    - root@sddc-manager [ /home/vcf ]# ./cleanup_upgrades_invalid_start_time.py <Bundle ID fetched from Step 4>
    --------------------------------------------------------------------------
    LOG FILE : /var/log/vmware/vcf/lcm/cleanup_upgrades_invalid_start_time.log
    --------------------------------------------------------------------------
    2023-12-04 22:58:16,607 [INFO] root: Performing cleanup for upgrade entry with invalid Start Time for bundle with IDs : ['<Bundle ID fetched from Step 4>']
    2023-12-04 22:58:16,687 [INFO] root: Upgrade Element entry cleanup complete.

  6. Log into the SDDC Manager UI and re-trigger the VxRail Manager upgrade.


 


Attachments

cleanup_upgrades_invalid_start_time.py get_app