After a prolonged vCenter disconnection, reconnecting vCenter to SSP causes the control plane and worker nodes to restart. Once the SSP cluster recovers, multiple pods in the nsxi-platform namespace may remain in Pending status because their PersistentVolumeClaims (PVCs) cannot bind — the CSI provisioner is unable to enumerate shared datastores due to a stale VM reference left in the vCenter inventory.
Pending pods (kubectl get pods -A | grep Pending):
nsxi-platform contextcorrelator-50e1fc9cd9aac642-exec-1 0/1 Pending 0 5d22h
nsxi-platform overflowcorrelator-7ead0c9ce40c4adb-exec-1 0/1 Pending 0 3d22h
nsxi-platform overflowcorrelator-7ead0c9ce40c4adb-exec-2 0/1 Pending 0 3d22h
nsxi-platform overflowcorrelator-7ead0c9ce40c4adb-exec-3 0/1 Pending 0 3d22h
nsxi-platform rawflowcorrelator-9017709ce4089d9e-exec-1 0/1 Pending 0 3d22h
nsxi-platform rawflowcorrelator-9017709ce4089d9e-exec-2 0/1 Pending 0 3d22h
nsxi-platform rawflowcorrelator-9017709ce4089d9e-exec-3 0/1 Pending 0 3d22h
PVC event log:
Waiting for a volume to be created either by the external provisioner
'csi.vsphere.vmware.com' or manually by the system administrator.
If volume creation is delayed, please verify that the provisioner is
running and correctly registered.
CSI controller log (kubectl logs <csi-controller-pod> -n vmware-system-csi):
The object 'vim.VirtualMachine:vm-223490' has already been deleted
or has not been completely created
PVC and PV status:
This is a known CNS/CSI driver race condition triggered by the following sequence of events:
Restart the CSI driver controller deployment and all CSI node driver pods. This forces the provisioner to re-enumerate the vCenter inventory, clearing the stale VM reference and unblocking PVC provisioning.
k rollout restart deployment <csi-controller-name> -n vmware-system-csi
k rollout status deployment <csi-controller-name> -n vmware-system-csi
k delete pods <csi-node-pod-1> <csi-node-pod-2> ... -n vmware-system-csi
k get pods -A | grep Pending
k get pvc -A | grep Pending
Once PVCs bind successfully, the Spark executor pods (contextcorrelator, overflowcorrelator, rawflowcorrelator) will be scheduled and transition to Running automatically.
Note: This issue presents similarly to KB 388841 (Rawflowcorrelator pod stuck in Pending state), however the root causes and resolutions are distinct.
KB 388841 is caused by a failed PVC creation during Spark driver startup due to slow network conditions, and is resolved by restarting the Spark driver pod.
This article (KB 433510) is caused by a CSI provisioner failure due to a stale VM reference following a vCenter disconnection event — restarting the Spark driver will not resolve this issue.
Confirm the cause by checking CSI controller logs for the vim.VirtualMachine error before proceeding with either resolution.