After performing a vCenter SSO Domain repoint in a NSX networking setup, the vSphere Supervisor Cluster is stuck in Error, Configuring or Removing state.
From the vSphere web client, under Workload Management, the Supervisor cluster shows Configuring, Removing or Error state.
While connected to the vCenter Server Appliance (VCSA), the wcpsvc log shows multiple errors similar to the following:
cat /var/log/wcp/wcpsvc.log
Sending HTTP request 'GET' to NSX managers for Principal Identities
Error sending HTTP request to NSX Manager
http request failed. URL: http://localhost:1080/external-cert/http1/<NSX manager IP>/443/api/v1/trust-management/token-principal-identities.
Status Code: 403. Status: 403 Forbidden
The wcpsvc log may also show only entries for the older/previous LOCAL domain, indicating that the SSO domain has not been updated:
INFO AuthorizationService.AuditLog opId=<opId>] Action performed by principal(name=<old LOCAL domain>\vpxd-extension-<id>,isGroup=false):Add global access [ Principal=Name=<old LOCAL domain>\NsxAdministrators,isGroup=true,roles=[<id>],propagating=true ]
While connected to the Supervisor cluster context, all NSX-NCP pods are in CrashLoopBackOff state, where the status values may vary depending on the state of the container restarts:
kubectl get pods -n vmware-system-nsx
NAME READY STATUS RESTARTS
<nsx-ncp-pod-name-a> 0/2 CrashLoopBackOff ###(MmSSs ago)
<nsx-ncp-pod-name-b> 1/2 CrashLoopBackOff ###(MmSSs ago)
This issue can also cause deactivating a Supervisor cluster to become stuck Removing with the below error:
Unable to disable cluster domain-c<id>. Err failed cleaning NSX-T resources due to failure to fetch supervisor <supervisor cluster id> principal identity: failed to get Principal Identity 'wcp-cluster-user-domain-c<id>-<supervisor cluster id>' for Supervisor '<supervisor cluster id>': error listing Principal Identities from NSX managers: error listing Principal Identities: GET http request failed. URL: http://localhost:1080/external-cert/http1/<NSX Manager IP>/443/api/v1/trust-management/token-principal-identities. Status Code: 403. Status: 403 Forbidden
vSphere Supervisor 8.0
vSphere Supervisor 9.0
NSX-T 4.X
The SSO domain change needs to be manually updated for the NSX-NCP pods in the Supervisor cluster.
NsxAdministrators
NsxViAdministrators
NsxAuditors
| Role name | Description | Privileges |
| NSX Administrator | Allows vSphere user to view and modify NSX configuration |
NSX - Modify NSX configuration |
| NSX Auditor | Allows vSphere user to view NSX configuration | NSX - Read NSX Configuration |
| NSX VI Administrator | Allows vSphere user to manage NSX | NSX - Modify NSX configuration |
| User/Group | Role | Defined in | Propagate to children |
| VSPHERE.LOCAL\NsxAdministrators | NSX Administrator | Global Permission | √ |
| VSPHERE.LOCAL\NsxAuditors | NSX Auditor | Global Permission | √ |
| VSPHERE.LOCAL\NsxViAdministrators | NSX VI Administrator | Global Permission | √ |
service-control --restart wcp
kubectl get configmap nsx-ncp-config -n vmware-system-nsx -o yaml > nsx-ncp-config-backup.yaml
kubectl edit configmap nsx-ncp-config -n vmware-system-nsx
apiVersion: v1
data:
...
}\nvc_endpoint
= <vcenter.FQDN>\nsso_domain = <repointed sso domain>\nhttps_port = <port>\n\n"
kubectl rollout restart deploy -n vmware-system-nsx nsx-ncp
kubectl get pods -n vmware-system-nsx