SDDC Manager vCenter upgrade fails at pre-validation stage
search cancel

SDDC Manager vCenter upgrade fails at pre-validation stage

book

Article ID: 390255

calendar_today

Updated On:

Products

VMware SDDC Manager / VCF Installer

Issue/Introduction

vCenter upgrades initiated from SDDC Manager fail during the initial pre-validation checks.

  • The SDDC Manager UI displays the following error message:

Vcenter Upgrade and interop compatibility checks failed.","metadata":"VCENTER upgrade bundle is not compatible with your existing version 8.0.2.00400-23929136. Refer to the https://kb.vmware.com/s/article/91728 to address incompatibilities and proceed with the upgrade.
Please retry the upgrade once the upgrade is available again.\nContact GSS for assistance. Reference Token:

  • The /var/log/vmware/vc/lcm/lcm-debug.log on the SDDC Manager reports a NullPointerException when attempting to fetch the version from the cluster

YYYY-MM-DDTHH:MM:SS.234+0000 INFO  [vcf_lcm,0000000000000000,0000,upgradeId=d124f037-997f-4470-####-693df4e2025f,resourceType=VCENTER,resourceId=########-####-####-####-############,bundleElementId=3eab1127-####-yyyy-a2fa-df3163f8be1a] [c.v.e.s.c.c.v.vsphere.VsphereClient,Upgrade-2] Successfully logged in to https://vCenter_FQDN.net:443/sdk
YYYY-MM-DDTHH:MM:SS.595+0000 ERROR [vcf_lcm,0000000000000000,0000,upgradeId=d124f037-997f-4470-####-693df4e2025f,resourceType=VCENTER,resourceId=########-####-####-####-############,bundleElementId=3eab1127-####-yyyy-a2fa-df3163f8be1a] [c.v.e.s.l.p.impl.esx.EsxUtils,Upgrade-2] Exception while fetching the current version for hosts belonging to vcenter with cluster MOID: domain-cx
YYYY-MM-DDTHH:MM:SS.599+0000 ERROR [vcf_lcm,0000000000000000,0000,upgradeId=d124f037-997f-4470-####-693df4e2025f,resourceType=VCENTER,resourceId=########-####-####-####-############,bundleElementId=3eab1127-####-yyyy-a2fa-df3163f8be1a] [c.v.e.s.l.p.i.v.VcenterUpgradeCompatibilityPrimitiveHelper,Upgrade-2] Failed to calculate vcenter upgrade compatibility com.vmware.evo.sddc.lcm.model.error.LcmException: null
        at com.vmware.evo.sddc.lcm.primitive.impl.esx.EsxUtils.getEsxVersion(EsxUtils.java:244)
        at com.vmware.evo.sddc.lcm.primitive.impl.esx.EsxUtils.getEsxVersionsOfCluster(EsxUtils.java:211)
        at com.vmware.evo.sddc.lcm.primitive.util.CompatibilityVersionHelper.getEsxVersionsForVcenterAndCluster(CompatibilityVersionHelper.java:108)
        at com.vmware.evo.sddc.lcm.primitive.impl.vcenter.VcenterUpgradeCompatibilityHelper.getDistinctInteropCombinationForPostUpgradeInteropVerification(VcenterUpgradeCompatibilityHelper.java:163)
        at com.vmware.evo.sddc.lcm.primitive.impl.vcenter.VcenterUpgradeCompatibilityHelper.getInterOpCompatibility(VcenterUpgradeCompatibilityHelper.java:96)
        at com.vmware.evo.sddc.lcm.primitive.impl.vcenter.VcenterUpgradeCompatibilityHelper.isVcenterUpgradeVvsCompatible(VcenterUpgradeCompatibilityHelper.java:51)
        at com.vmware.evo.sddc.lcm.primitive.impl.vcenter.VcenterUpgradeCompatibilityPrimitiveHelper.verifyUpgradeVvsCompatibility(VcenterUpgradeCompatibilityPrimitiveHelper.java:58)
        at com.vmware.evo.sddc.lcm.primitive.impl.vcenter.VCenterUpdater.update(VCenterUpdater.java:183)
        at com.vmware.evo.sddc.lcm.primitive.impl.vcenter.VCenterPatchUpgradeImpl.postUpgradeHelper(VCenterPatchUpgradeImpl.java:240)
        at com.vmware.evo.sddc.lcm.primitive.impl.vcenter.VCenterPatchUpgradeImpl.postUpgradeWithState(VCenterPatchUpgradeImpl.java:149)
        at com.vmware.evo.sddc.lcm.primitive.impl.vcenter.VCenterPatchUpgradeImpl.postUpgrade(VCenterPatchUpgradeImpl.java:141)
        at com.vmware.evo.sddc.lcm.orch.PrimitiveServiceImpl.postUpgradeAsync(PrimitiveServiceImpl.java:289)
        at com.vmware.evo.sddc.lcm.orch.PrimitiveServiceImpl.lambda$postUpgrade$0(PrimitiveServiceImpl.java:165)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
        at java.base/java.lang.Thread.run(Thread.java:840)
Caused by: java.lang.NullPointerException: Cannot invoke "com.vmware.vim.binding.vim.host.ConfigInfo.getProduct()" because the return value of "com.vmware.vim.binding.vim.HostSystem.getConfig()" is null
        at com.vmware.evo.sddc.lcm.primitive.impl.esx.EsxUtils.lambda$getEsxVersion$0(EsxUtils.java:238)
        at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
        at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197)
        at java.base/java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:992)
        at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509)
        at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499)
        at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
        at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
        at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
        at java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:596)
        at com.vmware.evo.sddc.lcm.primitive.impl.esx.EsxUtils.getEsxVersion(EsxUtils.java:233)

  • When checking the vSphere Client, one or more ESXi hosts within the target cluster display a state of "Not Responding" or "Disconnected"

Environment

VMware Cloud Foundation 5.x

Cause

The LCM service fails with a NullPointerException because it cannot retrieve configuration data from ESXi hosts in a "Not Responding" or "Disconnected" state.

 

Resolution

To resolve this issue 

Workaround 1: Bypass Pre-validation in SDDC Manager

  1. SSH into the SDDC Manager appliance as the vcf user and use su - to switch to the root user
  2. Verify the current compatibility precheck flag by running the following command:
    • grep "lcm.enable.vvs.compatibility.upgrade.precheck" /opt/vmware/vcf/lcm/lcm-app/conf/application-prod.properties
    • The value should return as true
  3. Create a backup of the properties file:
    • cp /opt/vmware/vcf/lcm/lcm-app/conf/application-prod.properties /opt/vmware/vcf/lcm/lcm-app/conf/application-prod.properties.bck
  4. Open the properties file in a text editor: 
    • vi /opt/vmware/vcf/lcm/lcm-app/conf/application-prod.properties 
  5. Locate the compatibility check property and set it to false
    • vcf.compatibility.controllers.compatibilityCheckEnabled=false 
  6. Save and close the file
    • wq! 
  7. Return to the SDDC Manager UI and restart the vCenter Server upgrade
  8. Once the upgrade completes successfully, edit the application-prod.properties file again and revert the property to true to ensure future validations function correctly. 
    • vi /opt/vmware/vcf/lcm/lcm-app/conf/application-prod.properties 
    • vcf.compatibility.controllers.compatibilityCheckEnabled=true. 

Workaround 2: Relocate the Unresponsive Host

  1. Identify the affected cluster by matching the MOID (domain-cx) referenced in the lcm-debug.log error: Exception while fetching the current version for hosts belonging to vcenter with cluster MOID: domain-cx
    • The cluster represented by the domain, can be identified on the VC, by selecting a cluster and checking it on the browser url.  
  2. Log in to the vSphere Client. Select clusters within the inventory and observe the URL in the browser to find the matching domain-cx value.
  3. Within the identified cluster, locate any ESXi hosts in a "Not Responding" state or displaying incomplete resource information. 
  4. Attempt to resolve the host connectivity issue with vCenter Server to return it to a healthy state. 
  5. If the host issue cannot be immediately resolved, right-click the disconnected host, select Disconnect, and then move the host out of the cluster into the parent Datacenter object. This explicitly excludes the host from the cluster-level LCM validation.  
  6. Return to the SDDC Manager UI and restart the vCenter Server upgrade. The process should now bypass the problematic host and progress to the installation stage. 

Note: Do not remove the host out of inventory, if you are intending to reuse the host after fix/repair. If the host is removed, then it would need to be cleaned up from the SDDC manager inventory and rebuild for re-commissioned.