NSX-NCP pods in CrashLoopBackOff state after NSX admin password expired
search cancel

NSX-NCP pods in CrashLoopBackOff state after NSX admin password expired

book

Article ID: 407766

calendar_today

Updated On:

Products

VMware NSX VMware vCenter Server 8.0 VMware Tanzu Kubernetes Grid

Issue/Introduction

  • NSX admin password expired and has been renewed.
  • Describing the crashing nsx-ncp shows similar to:
     Events:
     Type     Reason     Age                   From     Message
     ----     ------     ----                  ----     -------
     Warning  Unhealthy  6m58s (x235 over 9h)  kubelet  Liveness probe failed: CLI server is not ready
     Warning  BackOff    112s (x509 over 9h)   kubelet  Back-off restarting failed container nsx-ncp in pod nsx-ncp-xxxxxxxxxxx-xxxx_vmware-system-nsx(xxx-xxxxx-xxxxx-xxxx-xxx)
 
  • wcpsvc.log in vCenter at /var/log/vmware/wcp/ shows below :
     YYYY-MM-DDTHH:MM:SSZ warning wcp [common/k8sdeploymentutil.go:53] [opID=xxxxxx] Deployment vmware-system-nsx/nsx-ncp is not available
 
  • The following command will show an issue with API's to an NSX manager through 443 from Supervisor nodes.
     k logs -n vmware-system-nsx -l  component=nsx-ncp -c nsx-operator --follow
 
YYYY-MM-DDTHH:MM:SSZ ERROR   nsx/transport.go:98     request failed  {"error": "net/http: request canceled"}
time="YYYY-MM-DDTHH:MM:SSZ" level=error msg="Get \"https://xx.xxx.xx.xxx:443/policy/api/v1/search/query?query=resource_type:Rule AND tags.scope:nsx-op\/cluster AND tags.tag:domain-cxxxx\:xxxxxxx-xxxxx-xxxx-xxx-xxxxxxxx&page_size=1000\": net/http: request canceled (Client.Timeout exceeded while awaiting headers)"
time="YYYY-MM-DDTHH:MM:SSZ" level=info msg="Message formatter created for en, en, UTC, 2, SHORT_DATE_TIME"
time="YYYY-MM-DDTHH:MM:SSZ" level=error msg="Request Timed out: Get \"https://xx.xxx.xx.xxx:443/policy/api/v1/search/query?query=resource_type:Rule AND tags.scope:nsx-op\/cluster AND tags.tag:domain-cxxxx\:xxxxxxx-xxxxx-xxxx-xxx-xxxxxxxx&page_size=1000\": net/http: request canceled (Client.Timeout exceeded while awaiting headers)"
YYYY-MM-DDTHH:MM:SSZ INFO    common/store.go:176     initialized store       {"resourceType": "Rule", "count": 0}
YYYY-MM-DDTHH:MM:SSZ ERROR   cmd/main.go:72  failed to initialize securitypolicy commonService       {"controller": "SecurityPolicy", "error": "com.vmware.vapi.std.errors.timed_out"}

Environment

Tanzu Kubernetes Grid Service

vCenter 8.x

NSX-T 4.x

Cause

NSX manager indexing issue, resulting in API service becoming unavailable and does not return result for NCP queries.

Resolution

Perform rolling reboot of NSX manager and ensure the NSX UI is working without reporting an indexing issue after.

If this does not resolve, please collect NSX support bundle (NSX: Collect Support Bundles) and open a support ticket with Broadcom support (Broadcom support)