NSX-T Install on a host in a vLCM enabled cluster fails with "Setting NSX depot(s) on Compute Manager"

Products

VMware NSX

Issue/Introduction

Adding a Transport Node Profile (TNP) to a vLCM enabled vSphere cluster fails.
In the NSX-T UI under: System - Fabric Compute manager, the status of compute manager is green and connected and both "Enable Trust" and "Create Service Account" are enabled.
Disabling the ESXi host firewall does not resolve the issue.
Rebooting the ESXi host does not resolve the issue.
Clicking the "Failed" error on the vLCM cluster in System - Fabric - Nodes - Host Transport Nodes, shows the following error and note the RESOLVE button is grayed out when you select the checkbox:

NSX Install on a host in a vLCM-enabled cluster fails with "Setting NSX depot(s) on Compute Manager: <CM_id> failed with error: null. Retry Transport Node Collection at cluster."

Clicking the "NSX Install Failed" error on the host in the vLCM cluster shows the "Installation Progress" stopped at "Preparing Installation".

The below API's show CM service is up and service account true

GET https://{{MPIP}}/api/v1/fabric/compute-managers/<CM-id>

GET https://{{MPIP}}/api/v1/fabric/compute-managers/<CM-id>/status

Running the following POST API to get a session id does not return a session id for the vCenter service account.

curl -k -i -X POST -H 'X-NSX-Username:admin' http://localhost:7443/cm-inventory/api/v1/fabric/compute-managers/compute-manager-UUID>?action=get-vapi-session-id

HTTP/1.1 200

...

{
... --->>> Missing session_id here, see below workaround for correct response.
}

After running the above API call, the UI may show an error for the compute manager.

In the /var/log/cm-inventory/cm-inventory.log

file, you see entries similar to:

2022-05-09T17:34:12.482Z ERROR http-nio-127.0.0.1-7443-exec-3 LcmRestClient 8736 FABRIC [nsx@6876 comp="nsx-manager" errorCode="MP31815" level="ERROR" reqId="22d477fb-####-####-####-1ed73018967e" subcomp="cm-inventory" username="admin"] Error in rest call url= //rest/com/vmware/cis/session , method= POST , response= {"type":"com.vmware.vapi.std.errors.unauthenticated","value":{"error_type":"UNAUTHENTICATED","messages":

...
2022-05-09T17:34:12.483Z WARN http-nio-127.0.0.1-7443-exec-3 VcConnection 8736 SYSTEM [nsx@6876 comp="nsx-manager" level="WARNING" reqId="22d477fb-####-####-####-1ed73018967e" subcomp="cm-inventory" username="admin"] Error occurred while getting vapi session Id for cm <compute-manager>

com.vmware.nsx.management.lcm.common.exception.LcmRestException: org.springframework.web.client.HttpClientErrorException$Unauthorized: 401 Unauthorized: [{"type":"com.vmware.vapi.std.errors.unauthenticated","value":{"error_type":"UNAUTHENTICATED","messages":[{"args":[],"default_message":"Authentication required.","id":"com.vmware.vapi.endpoint.method.a... (1036 bytes)]

In the /var/log/proton/nsxapi.log file, you see entries similar to:

2022-04-13T15:27:03.588Z INFO ActivityWorkerPool-1-17 TransportNodeCollectionVlcmActivity 4039 FABRIC [nsx@6876 comp="nsx-manager" level="INFO" subcomp="manager"] TNC realization failed at cluster level API call with error null

2022-04-13T15:27:03.595Z INFO ActivityWorkerPool-1-17 TransportNodeDesiredStateErrorServiceImpl 4039 FABRIC [nsx@6876 comp="nsx-manager" level="INFO" subcomp="manager"] Updated TransportNodeDesiredStateError object [TransportNodeDesiredStateError [id =TransportNodeDesiredStateError/<cm uuid>:domain-c118:VLCM_ERROR_AT_CLUSTER_LEVEL,computeCollectionId = <cm uuid>:domain-c118,desiredStateErrorMessage = 26195: Setting NSX depot(s) on Compute Manager: <cm uuid> failed with error: null. Retry Transport Node Collection at cluster.]]

Environment

VMware NSX-T Data Center

Cause

The service account automatically created in vCenter when registering the compute manager with NSX-T has expired.
Currently there is no automatic mechanism to renew this account.

Resolution

This issue is resolved in NSX-T 3.2.2, see Download Broadcom products and software

Workaround:

Log in to the NSX-T Manager UI.
Go to System - Fabric - Compute Managers, select and edit the compute manager.
Click on Edit next to "FQDN or IP address" and renter the vSphere username and password used to register with NSX-T.

This will re-create the service account, and trigger a Full inventory sync from vCenter.
Please note, this Sync can last up to an hour, depending on the size of your environment.

Note: The password policy maximum lifetime in vCenter can be edited manually under Administration - Configuration - Local Accounts - Edit.

After following the resolution steps you can check again the API to return the session ID. It will look like the sample below.

curl -k -i -X POST -H 'X-NSX-Username:admin' http://localhost:7443/cm-inventory/api/v1/fabric/compute-managers/<cm uuid>?action=get-vapi-session-id



HTTP/1.1 200

...

{

  "session_id" : "################94540af5826208f06",

  "thumbprint" : "A6:40:##:##:##:##:##:##:##:##:14:87:5D:38:Bf:84:D0:FD:07:7A:A9:45:01:88:66:F8:##:##:##:##:##:##",

  "credential_type" : "SessionLoginCredential"

}

If the above steps do not resolve the error, please restart proton service on all the NSX Managers.
Restarting proton on all the managers will clear out the cache and NSX Manager will be forced to make a new API Call for vAPI Token.
Repeat the following steps for each NSX Manager:
Log in to the NSX-T manager as admin and check the cluster status is healthy:

get cluster status

Then log in to the NSX-T Manager as root, type st en and enter the root password.
Run the command to restart the proton service:

/etc/init.d/proton restart

Wait for proton service to come up check status using:

/etc/init.d/proton status

Run the command to restart the upgrade-coordinator service:

/etc/init.d/upgrade-coordinator restart

Wait for upgrade-coordinator service to come up check status using:

/etc/init.d/upgrade-coordinator status

Check that NSX Cluster is in STABLE state:

get cluster status

After restarting the services on all three NSX-T Managers, click the RESOLVE button on the host which is showing the "NSX Install Failed" error in System - Fabric - Nodes - Host Transport Nodes.