vsphere-csi-controller pods and nsx-ncp pods are in CrashLoopBackOff
# kubectl get pods -A | grep -v Run
NAMESPACE NAME READY STATUS RESTARTS AGE
vmware-system-csi vsphere-csi-controller-xxxxxx-xxxx 3/7 CrashLoopBackOff 2602 (3m12s ago) 44h
vmware-system-csi vsphere-csi-controller-xxxxxx-xxxx 3/7 CrashLoopBackOff 2581 (32s ago) 44h
vmware-system-csi vsphere-csi-controller-xxxxxx-xxxx 6/7 CrashLoopBackOff 653 (75s ago) 44h
vmware-system-nsx nsx-ncp-xxxxxx-xxxx 0/2 CrashLoopBackOff 4090 (2m2s ago) 11d
vmware-system-nsx nsx-ncp-xxxxxx-xxxx 0/2 CrashLoopBackOff 1560 (13s ago) 44h
The vsphere-csi-controller and nsx-ncp logs indicate failed to re-establish VC connection like below
csi-controller log:
2025-02-19T03:54:08.233512036Z stderr F {"level":"error","time":"2025-02-19T03:54:08.233469274Z","caller":"wcp/controller.go:286","msg":"failed to re-establish VC connection. Will retry again in 60 seconds. err: failed to connect to VirtualCenter host: \"vc-example.in.co\", Err: Post \"https://vc-example.in.co:443/sdk\": dial tcp: lookup vc-example.in.co on 127.0.0.53:53: read udp 127.0.0.1:37179->127.0.0.53:53: i/o timeout","TraceId":"xxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxx","stacktrace":"sigs.k8s.io/vsphere-csi-driver/v3/pkg/csi/service/wcp.(*controller).Init.func2\n\t/build/mts/release/bora-23905383/cayman_vsphere_csi_driver/vsphere_csi_driver/src/pkg/csi/service/wcp/controller.go:286"}
2025-02-19T03:54:08.233534074Z stderr F {"level":"info","time":"2025-02-19T03:54:08.2333685Z","caller":"vsphere/virtualcenter.go:384","msg":"Reloading latest VC config from vSphere Config Secret for vcenter: \"vc-example.in.co\"","TraceId":"xxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxx"}
2025-02-19T03:54:08.233782435Z stderr F {"level":"info","time":"2025-02-19T03:54:08.233738354Z","caller":"vsphere/utils.go:259","msg":"Defaulting timeout for vCenter Client to 5 minutes","TraceId":"xxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxx"}
nsx-ncp log:
[ncp GreenThread-1 I] nsx_ujo.ncp.k8s.kubernetes HTTP session did not have a 'Content-type' header
[ncp GreenThread-1 I] nsx_ujo.ncp.k8s.kubernetes HTTP session did not have a 'Content-type' header
[ncp MainThread I] nsx_ujo.ncp.vc.session Refreshing token and re-instantiating TESSession
[ncp MainThread I] nsx_ujo.ncp.vc.session Retrieving VC Credentials for the first time
[ncp GreenThread-1 I] nsx_ujo.ncp.k8s.kubernetes HTTP session did not have a 'Content-type' header
[ncp MainThread W] nsx_ujo.ncp.vc.session Failed to get JWT token: Failed SAML HoK request: Failed to get or renew SAML HoK from STS due to failed DNS lookup for VC endpoint: [Errno -3] Lookup timed out., will retry after 120 seconds
[ncp GreenThread-1 I] nsx_ujo.ncp.k8s.kubernetes HTTP session did not have a 'Content-type' header
[ncp GreenThread-1 I] nsx_ujo.ncp.k8s.kubernetes HTTP session did not have a 'Content-type' header
[ncp GreenThread-1 I] nsx_ujo.ncp.election Seqno expired for master xxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxx
[ncp GreenThread-1 I] nsx_ujo.ncp.election Instance xxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxx elected master.
Session terminated, terminating shell...[ncp MainThread W] nsx_ujo.ncp.main Receive signal for handling 15
[ncp MainThread W] nsx_ujo.ncp.main Main process is exiting, terminate election process!
...terminated.
unable to resolve VC and nsxt manager fqdn/IP from the supervisor
unable to reach the DNS server
VMware vSphere with Tanzu
Connectivity issues between Supervisor and VC or NSX
Customer to work with the internal network team to ensure all required ports are open and validate if there are any firewall rules set
Supervisor should be able to communicate to the domain controllers
Connectivity from supervisor to the vCenter and NSX edge VM's should be verified and ensure to work as expected
Once the networking issues are resolved, the original issue with auto resolve and pods will come back to Running state
Engage VMware by Broadcom's networking team to check the connectivity issues, if required