Symptoms:
Events:Type Reason Age From Message--- ------ ---- ---- -------Warning Unhealthy 6m58s (x235 over 9h) kubelet Liveness probe failed: CLI server is not readyWarning BackOff 112s (x509 over 9h) kubelet Back-off restarting failed container nsx-ncp in pod nsx-ncp-xxxxxxxxxxx-xxxx_vmware-system-nsx(xxx-xxxxx-xxxxx-xxxx-xxx)
[ncp MainThread W] vmware_nsxlib.v3.cluster [7f1af4b7c730] Request failed due to: Certificate not trusted[ncp MainThread W] vmware_nsxlib.v3.cluster [7f1af4b7c730] Request failed due to an exception that calls for regeneration. Re-generating pool.[ncp MainThread I] nsx_ujo.ncp.vc.session Refreshing token and re-instantiating TESSession[ncp MainThread I] nsx_ujo.ncp.vc.session VC credentials were not changed[ncp MainThread I] nsx_ujo.ncp.vc.session Successfully retrieved JWT token:
OR
kubectl logs -n vmware-system-nsx -l component=nsx-ncp -c nsx-operator --follow shows the follwing.
YYYY-MM-DD HH:MM:SS.899 ERROR util/utils.go:245 handle http response {"status": 401, "requestUrl": "https://vcsa-fqdn:443/rest//vcenter/tokenservice/token-exchange", "responseError": "json: unsupported type: func() (io.ReadCloser, error)", "error": "received HTTP Error"}YYYY-MM-DD HH:MM:SS.899 ERROR jwt/tesclient.go:75 failed to exchange JWT {"error": "received HTTP Error"}YYYY-MM-DD HH:MM:SS.899 ERROR jwt/jwtcache.go:78 JWT cache failed to refresh JWT {"error": "failed to exchange JWT due to error :received HTTP Error"}
NSX compute manager edit setting error occurred while trying to trust the thumbprint of the vCenter machine SSL.
Failed to enable trust on Compute Manager due to error There already exists an OIDC end-point with Issuer https://vCenter_fqdn/openldconnect/vsphere.local.. Please check https://vCenter_FQDN/openidconnect/vsphere.local/.well-known/openld-configuration Is reachable from NSX manager nodes. (Error code: 90011)
A general system error occurred. Error message: failed to create WCP Service Principal Identity: NSX Principal Identity creation failed: error sending HTTP request: Post "http://localhost:1080/external-cert/http1/NSXT_FQDN/443/api/v1/trust-management/token-principal-identities": context deadline exceeded (Client.Timeout exceeded while awaiting headers) error sending HTTP request: Post "http://localhost:1080/external-cert/http1/NSXT_FQDN/443/api/v1/trust-management/token-principal-identities": context deadline exceeded (Client.Timeout exceeded while awaiting headers) error sending HTTP request: Post "http://localhost:1080/external-cert/http1/NSXT_FQDN/443/api/v1/trust-management/token-principal-identities": context deadline exceeded (Client.Timeout exceeded while awaiting headers) error sending HTTP request:Post "http://localhost:1080/external-cert/http1/NSXT_FQDN/443/api/v1/trust-management/token-principal-identities": context deadline exceeded (Client.Timeout exceeded while awaiting headers).vSphere Kubernetes Service
vCenter Server 8.x
vCenter Server 9.x
NSX-T Manager 4.x
NSX- T Manager 9.x
Difference/mismatch in thumbprint between the newly generated vCenter machine SSL certificate and the existing thumbprint saved within the NSX compute manager. API calls from the NSX Container Plugin (NCP) to NSX Manager failed TLS validation, causing CrashLoopBackOff.
# echo | openssl s_client -connect localhost:443 2>/dev/null | openssl x509 -noout -fingerprint -sha256
# curl -k -u admin -X GET 'https://localhost/api/v1/trust-management/oidc-uris'
Enter host password for user 'admin':
{ "results" : [ { "oidc_uri" : "https://vCenter_FQDN/openidconnect/.well-known/openid-configuration", "thumbprint" : "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX", "oidc_type" : "vcenter", "scim_endpoints" : [ ], "claim_map" : [ ], "serviced_domains" : [ ], "restrict_scim_search" : false, "end_session_endpoint_uri" : "https://vCenter_FQDN/openidconnect/logout/vsphere.local", "issuer" : "https://vCenter_FQDN/openidconnect/vsphere.local", "jwks_uri" : "https://vCenter_FQDN/openidconnect/jwks/vsphere.local", "token_endpoint" : "https://vCenter_FQDN/openidconnect/token/vsphere.local", "claims_supported" : [ ], "override_roles" : [ ], "csp_config" : { "customer_org_id" : "", "additional_org_ids" : [ ] },
To address the issue please follow the steps below:
Complete the steps outlined in the KB article: Failed to enable trust on Compute Manager in NSX
After completing the steps in the KB article, restart the NSX-NCP pods by scaling down the nsx-ncp deployment in the vmware-system-nsx namespace. Use the following command to scale the deployment to 0 replicas and then back to the desired number (e.g., 1 or 2 replicas):
kubectl get deployments.apps -n vmware-system-nsx
kubectl scale deployment nsx-ncp --replicas=0 -n vmware-system-nsxkubectl scale deployment nsx-ncp --replicas=1 -n vmware-system-nsx
kubectl get deployments.apps -n vmware-system-nsx
If the issue persists after restarting the pods, involve Broadcom support for further assistance. Refer to Creating and managing Broadcom support cases for guidance on opening a support case.