NSX-T Upgrade from SDDC manager fails at NSX_UPGRADE_STAGE_CLUSTER_PRECHECK
search cancel

NSX-T Upgrade from SDDC manager fails at NSX_UPGRADE_STAGE_CLUSTER_PRECHECK

book

Article ID: 379200

calendar_today

Updated On:

Products

VMware SDDC Manager VMware Cloud Foundation 5.x

Issue/Introduction

  • NSX upgrade fails or shows drift errors in SDDC Manager after you restart an NSX upgrade directly through NSX Manager instead of SDDC Manager.
  • This occurs when the original upgrade fails partway through (such as with a JVM error) and you retry the upgrade from the NSX Manager UI. The out-of-band upgrade creates a partial upgrade state with mixed component versions. SDDC Manager cannot reconcile its inventory with the actual NSX state.
  • NSX shows upgraded but SDDC Manager UI does not reflect it.
  • You see an error in the SDDC Manager UI under Updates similar to:
    Retrieving configuration updates failed. Unable to compute applicability for drift RemoveNfsDatastoreConfigDrift. Because configuration realized check failed on resources [########-####-####-####-############]. Please check logs and fix the failures of drift configuration realized checks. Then restart the service to trigger the configuration realized checks again and re-try the API. If this does not resolve the issue, please contact GSS.
    
  • When retrying the upgrade, you see an error similar to:
    Message: com.vmware.vcf.error.runtime.nsxt.already.upgraded
    Remediation Message: NSX cluster is already upgraded. Download a new bundle (if available). Retry the upgrade, once available.
    
  • In /var/log/vmware/vcf/domainmanager/domainmanager.log, you see errors similar to:
    NSX version 4.2.3.0.0-24866349 is not supported for addVi
    
  • In /var/log/vmware/vcf/lcm/lcm-debug.log, you see errors similar to:
    [vcf_lcm,0000000000000000,0000,upgradeId=####,resourceType=NSX_T_PARALLEL_CLUSTER,resourceId=af-mgt-NSX_FQDN:_ParallelClusterUpgradeElement,bundleElementId=####] [c.v.e.s.l.p.i.nsxt.NsxtUpgradeUtil,Upgrade-2] Setting Upgrade Error for stage NSX_UPGRADE_STAGE_EDGE_POSTCHECK, error description Check overall transport node status: [Overall status of the edge transport node #### is DOWN.]:, remediation Check for errors in the LCM log files at ###.###.###.###:/var/log/vmware/vcf/lcm, and address those errors. Please run the upgrade precheck and restart the upgrade.
    
    ERROR [vcf_lcm,0000000000000000,0000,upgradeId=####,resourceType=NSX_T_PARALLEL_CLUSTER,resourceId:_ParallelClusterUpgradeElement,bundleElementId=yyyy] [c.v.e.s.l.p.i.n.s.NsxtEdgeClusterParallelUpgradeStageRunner,Upgrade-2] upgrade error for resource { "errorType": "RECOVERABLE", "stage": "NSX_UPGRADE_STAGE_EDGE_POSTCHECK", "errorCode": "com.vmware.vcf.error.runtime.nsxt.edge.cluster.postcheck.failed", "errorDescription": "Check overall transport node status: [Overall status of the edge transport node #### is DOWN.]:, "metadata": "Check for errors in the LCM log files at ###.###.###.###:/var/log/vmware/vcf/lcm, and address those errors. Please run the upgrade precheck and restart the upgrade.", "metadataAttributes": { "LCM_LOG_LOCATION": "/var/log/vmware/vcf/lcm", "LCM_HOST_ADDRESS": "###.###.###.###" }, "referenceToken": "ABC" }
    
  • SDDC Manager blocks further upgrades until the version inconsistency is resolved.

Additional symptoms reported:

  • NSX update fails with JVM issue, then restarted in NSX Manager
  • SDDC Manager will not allow upgrade to continue after restart
  • Related KBs for the error do not seem to apply to your situation

Environment

  • VMware Cloud Foundation 5.2
  • VMware NSX

 

Cause

This issue occures when the SDDC Manager fails to insert the updated target version of NSX.

Resolution

This is a condition that may occur in an SDDC environment. In order to workaround the issue, you will update the SDDC Manager Inventory with the correct version of NSX-T Manager.

Workaround:

Follow the below steps to update the SDDC Manager inventory for all the upgraded NSX-T instances

  1. Take a snapshot before proceeding further.
  2. SSH to SDDC Manager with vcf user and su to root
  3. Get VCF deployed NSX-T Cluster Inventory Ids associated with domains.
    # curl -v -k http://127.0.0.1:7100/inventory/nsxt | json_pp

    Sample output:

    {
    "clusterIpAddress" : "###.###.###.###",
    "shared" : false,
    "status" : "ACTIVE",
    "version" : "<current NSX-T Version>",
    "domainIds" : [
    "########-####-####-####-########c33d"
    "clusterFqdn" : "vip-nsx-mgmt.example.com",
    "id" : "<nsxt-entity-id>",
    "nsxtClusterDetails" : [
    "ipAddress" : "###.###.###.###",
    "vmName" : "nsx-mgmt-1",
    "fqdn" : "nsx-mgmt-1.example.com",
    "id" : "########-####-####-####-########1b90"
    }
  4. At the bottom of the output, please note the <nsxt-entity-id> for the upgraded NSX-T Manager instance

  5. For each NSX-T entity Id (NSX-T cluster id), update the NSX-T version with the correct version (Make sure you have already upgraded this NSX-T by logging to NSX-T cluster IP).

    curl -v -k http://127.0.0.1:7100/inventory/entities/<nsxt-entity-id> -X PATCH -d '{"type": "NSXT_CLUSTER","status": "ACTIVE","version":"<correct version>"}' -H 'Content-Type: application/json'

    NOTE: Do not update any IDs in the cluster details.