VCF pre-check on ESXi cluster fails with error "Failed to load NSX Cluster from the Inventory"

Products

VMware SDDC Manager

Issue/Introduction

When applying ESXi cluster patch updates via SDDC Manager UI, the wizard cannot progress due to the following error.
Cluster image hardware compatibility and compliance check finished on May 19, 2025, 4:13:50 PM and encountered 1 error
On SDDC Manager log file /log/vmware/vcf/lcm/lcm-debug.log will show similar entries to:

YYYY-MM-DDThh:mm:ss ERROR [vcf_lcm,682b1a32c0a48bd11bcb517ade6d6386,5035,auditId=515422f1-5d52-4f2d-867d-0ed2def8714d,resourceType=NSX_T_MANAGER,resourceId=<NSXT Manager FQDN>,name=<NSXT Manager FQDN>] [c.v.v.c.n.s.c.c.ComplexHelpers,vac-scheduler-1] Exception occurred during NSX API invocation
java.util.concurrent.ExecutionException: com.vmware.vapi.std.errors.InternalServerError: InternalServerError (com.vmware.vapi.std.errors.internal_server_error) => {
messages = [],
data = struct => {error_message=upstream connect error or disconnect/reset before headers. reset reason: connection failure, error_code=98, module_name=common-service},
errorType = INTERNAL_SERVER_ERROR
}
YYYY-MM-DDThh:mm:ss ERROR [vcf_lcm,682b1a32c0a48bd11bcb517ade6d6386,5035,auditId=515422f1-5d52-4f2d-867d-0ed2def8714d,resourceType=NSX_T_MANAGER,resourceId=<NSXT Manager FQDN>,name=<NSXT Manager FQDN>] [c.v.e.s.l.p.impl.nsxt.NsxtAuditImpl,vac-scheduler-1] Error auditing NSX Cluster <NSXT Manager FQDN> with exception {}
com.vmware.evo.sddc.lcm.model.error.LcmException: Failed to load NSX Cluster from the Inventory
Looking at the SDDC Manager DB, in the NSX-T table in the entry for the cluster the Audit status show an error:

auditError": { +
| "errorCode": "Failed to load NSX Cluster from the Inventory", +
| "errorDetails": "error_message : Failed to load NSX Cluster from the Inventory, httpStatus : , error_code : 0"

Query to check NSXT details in SDDC Manager platform DB

/usr/pgsql/13/bin/psql -h localhost -U postgres -c "\x" -c "select id,status,version,cluster_fqdn,configuration from nsxt where cluster_fqdn='nsxt-example.com'"

Sample output:

id | xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxx5e4c
status | ACTIVE
version | 4.2.0.0.0-24105817
cluster_fqdn | nsxt-example.com
configuration | { +
| "domainIds": [ +
| "yyyyyyyy-yyyy-yyyy-yyyy-yyyyyyyy342c" +
| ], +
| "vcfId": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxx5e4c", +
| "nsxtComputeManagers": {}, +
| "nsxtHostClusters": {}, +
| "nsxtEdgeClusters": {}, +
| "managerIpsFqdnMap": { +
| "<nsxt manager ip>": "nsxt1.example.com", +
| "<nsxt manager ip>": "nsxt2.example.com", +
| "<nsxt manager ip>": "nsxt3.example.com" +
| }, +
| "resourceMapper": {}, +
| "auditSucceeded": false, +
| "auditError": { +
| "errorCode": "Failed to load NSX Cluster from the Inventory", +
| "errorDetails": "error_message : Failed to load NSX Cluster from the Inventory, httpStatus : , error_code : 0"+
| }, +
| "resourceId": "nsxt-example.com", +
| "resourceName": "nsxt-example.com", +
| "version": { +
| "version": "4.2.0.0.0-24105817" +
| }, +
| "upgradeAvailable": false +
| }
On NSXT Manager log file /var/log/upgrade-coordinator/upgrade-coordinator.log indicates failure in calling upgrade-related APIs due to internal server error triggered by UC service being down on NSXT-Managers:

YYYY-MM-DDThh:mm:ss WARN tx-tracer-poller UfoTxnTracingService 75424 SYSTEM [nsx@6876 comp="nsx-manager" level="WARNING" subcomp="upgrade-coordinator"] UfoTxnTracingService[id=1a6f0c81-7643-685e-7ddf-3760774571ae]: long running tx has been running for seconds=7593399, numTxnAccess=1
YYYY-MM-DDThh:mm:ss WARN tx-tracer-poller UfoTxnTracingService 75424 SYSTEM [nsx@6876 comp="nsx-manager" level="WARNING" subcomp="upgrade-coordinator"] UfoTxnTracingService[id=ae167c90-b5a6-6638-d1c3-f0da1812afec]: long running tx has been running for seconds=7486010, numTxnAccess=1

Environment

VCF 5.x

Cause

The issue is caused by upgrade-coordinator service becoming unresponsive on NSX Manager nodes due to persistent long-running transaction exceptions.
As a result, SDDC is unable to load NSX cluster inventory, leading to a failure in vLCM precheck process.

Resolution

Restart install-upgrade service on all the 3 NSX Manager nodes using “restart service install-upgrade” command.

Additional Information

Enable ssh root access for NSX appliances