/api/v1/systemhealth/container-cluster/ncp/status:{
"results": [
{
"cluster_id": "########-####-####-####-############",
"cluster_name": "pks-#######-####-####-####-############",
"type": "Kubernetes",
"status": "UNKNOWN",
"detail": "",
"_protection": "NOT_PROTECTED"
},
VMware NSX 4.x
Tanzu Kubernetes Grid Integrated Edition (TKGi)
This issue is a false positive caused by stale inventory entries remaining in the NSX database.
This condition occurs when a TKGi or WCP cluster is deleted or removed from the environment without performing the proper corresponding cleanup within NSX. Because the cluster objects still exist in the NSX database, the NSX Manager continues to actively poll the removed cluster. When the polling inevitably fails, the NSX Manager determines the cluster is unhealthy and triggers the "NCP plugin down" alarm.
To resolve this issue, you must clear the stale cluster objects from the NSX inventory using the policy cleanup script.
nsx_policy_cleanup.py script against your vCenter.IMPORTANT: The -r option performs the actual removal of NSX resources. It is advised to first run the script without this option. This will perform a dry-run, giving users the opportunity to evaluate the output to assess which resources will be removed before doing the actual deletion.
python3 /usr/lib/vmware-wcp/nsx_policy_cleanup.py --cluster <cluster name> -u <nsx admin user> -p '<nsx mgr admin pass>' --mgr-ip=<nsx mgr ip> --no-warning --top-tier-router-id=<cluster name> --all-resb. Actual Cleanup:
python3 /usr/lib/vmware-wcp/nsx_policy_cleanup.py --cluster <cluster name> -u <nsx admin user> -p '<nsx mgr admin pass>' --mgr-ip=<nsx mgr ip> --no-warning --top-tier-router-id=<cluster name> --all-res -r
4. Allow the script to execute. Note: The script successfully cleans the inventory resource early in its execution process. Even if the top-tier-router-id logic fails later in the script, the necessary stale inventory deletion will have already occurred.
To confirm the solution was successful and the stale entries have been removed, query the NSX Manager API:
Execute the following API GET request: GET /api/v1/systemhealth/container-cluster/ncp/status
Review the JSON response and verify that it no longer displays the removed cluster with an "UNKNOWN" status.