NCP Plugin Down

Products

VMware NSX

Issue/Introduction

1. The "NCP down" alarm in the NSX alarm dashboard indicates that the NSX Manager is unable to communicate with the Network Container Plugin.

"summary": "Manager Node has detected the NCP is down or unhealthy.",
"description": "Manager Node has detected the NCP is down or unhealthy.",
"recommended_action": "To find the clusters which are having issues, please use the NSX UI and navigate to the Alarms page. The Entity name value for this alarm instance identifies the cluster name. Or invoke the NSX API GET /api/v1/systemhealth/container-cluster/ncp/status to fetch all cluster statuses and determine the name of any clusters that report DOWN or UNKNOWN. Then on the NSX UI Inventory | Container | Clusters page find the cluster by name and click the Nodes tab which lists all Kubernetes and PAS cluster members. For Kubernetes cluster: 1. Check NCP Pod liveness by finding the K8s master node from all the cluster members and log onto the master node. Then invoke the kubectl command `kubectl get pods --all-namespaces`. If there is an issue with the NCP Pod, please use kubectl logs command to check the issue and fix the error. 2. Check the connection between NCP and Kubernetes API server. The NSX CLI can be used inside the NCP Pod to check this connection status by invoking the following commands from the master VM. `kubectl exec -it <NCP-Pod-Name> -n nsx-system bash` `nsxcli` `get ncp-k8s-api-server status` If there is an issue with the connection, please check both the network and NCP configurations. 3. Check the connection between NCP and NSX Manager. The NSX CLI can be used inside the NCP Pod to check this connection status by invoking the following command from the master VM. `kubectl exec -it <NCP-Pod-Name> -n nsx-system bash` `nsxcli` `get ncp-nsx status` If there is an issue with the connection, please check both the network and NCP configurations. For PAS cluster: 1. Check the network connections between virtual machines and fix any network issues. 2. Check the status of both nodes and services and fix crashed nodes or services. Invoke the command `bosh vms` and `bosh instances -p` to check the status of nodes and services.",

2. NCP Pods are found to be in CrashLoopBackOff state

kubectl get pods

3. Container logs reports the below error message, where it explains the clock skew problem

kubectl logs <ncp-pod-name> -n nsx-system

[ncp GreenThread-1 I] nsx_ujo.ncp.election Initialized election profile election-lock-domain-c31362-########-4f6e-4b36-####-############

[ncp GreenThread-1 I] nsx_ujo.ncp.k8s.kubernetes HTTP session did not have a 'Content-type' header

[ncp MainThread W] nsx_ujo.ncp.vc.session Failed to get JWT token: Failed SAML HoK request: Failed to get or renew SAML HoK from STS: SoapException:

faultcode: ns0:InvalidTimeRange

faultstring: The token authority rejected an issue request for TimePeriod [startTime=Sat May 31 05:38:09 GMT 2025, endTime=Sat May 31 05:48:09 GMT 2025] :: The requested token start time differs from the issue instant more than the acceptable deviation (clock tolerance) of 600000 ms. Requested token start time=Sat May 31 05:38:09 GMT 2025, issue instant time=Sat May 31 06:27:35 GMT 2025. This might be due to a clock skew problem.

faultxml: <?xml version='1.0' encoding='UTF-8'?><S:Envelope xmlns:S="http://schemas.xmlsoap.org/soap/envelope/"><S:Body><S:Fault xmlns:ns4="http://www.w3.org/2003/05/soap-envelope"><faultcode xmlns:ns0="http://docs.oasis-open.org/ws-sx/ws-trust/200512">ns0:InvalidTimeRange</faultcode><faultstring>The token authority rejected an issue request for TimePeriod [startTime=Sat May 31 05:38:09 GMT 2025, endTime=Sat May 31 05:48:09 GMT 2025] :: The requested token start time differs from the issue instant more than the acceptable deviation (clock tolerance) of 600000 ms. Requested token start time=Sat May 31 05:38:09 GMT 2025, issue instant time=Sat May 31 06:27:35 GMT 2025. This might be due to a clock skew problem.</faultstring></S:Fault></S:Body></S:Envelope>., will retry after 120 seconds

[ncp GreenThread-1 I] nsx_ujo.ncp.k8s.kubernetes HTTP session did not have a 'Content-type' header

Environment

VMware vSphere with Tanzu

VMware NSX

Cause

Supervisor cluster was running behind the actual timestamp.

NTP server issue created a clock skew problem

Resolution

Validate NTP connections across NSX Manager, Vcenter and Supervisor cluster.

Below commands can be used to validate the same.

timedatectl show 
timedatectl status 
timedatectl timesync-status 
timedatectl show-timesync

Additional Information

Check connectivity from NCP to NSX Manager -

kubectl exec it <ncp-pod-name> -n nsx-system -c nsx-ncp -- nsxcli -c get ncp-nsx status

Check connectivity from NCP K8s API Server -
```
kubectl exec -it <ncp-pod-name> -n nsx-system -c nsx-ncp -- nsxcli -c get ncp-k8s-api-server status
```
Output ref:

root@4####6e2##################dc75 [ ~ ]# k exec -it nsx-ncp-pod -n vmware-system-nsx -c nsx-ncp -- nsxcli -c get ncp-nsx status
Mon Jun 02 2025 UTC 08:17:11.661
NSX Manager status:
10.##.##.##:443: Healthy
10.##.##.##:443: Healthy
10.##.##.##:443: Healthy
10.##.##.##:443: Healthy

root@4####6e2##################dc75 [ ~ ]# k exec -it nsx-ncp-pod -n vmware-system-nsx -c nsx-ncp -- nsxcli -c get ncp-k8s-api-server status
Mon Jun 02 2025 UTC 08:17:42.030
Kubernetes ApiServer status: Healthy

root@4####6e2##################dc75 [ ~ 1#