The Spark pods were stuck in the Pending state because the Persistent Volume Claims (PVCs) were not getting bound successfully through the vSphere CSI driver.
The following errors were observed in CSI controller pod logs: to check logs login to SSPI using sysadmin if SSPI version is > 5.0 and root credentials
To check the logs:
Login to SSPI cli using the sysadmin account if the SSPI version is greater than 5.0
Commands used for troubleshooting:
k get pods -A | grep spark
Output:
Spark pods were not in the Running state.
k get pvc -A
Observation:
PVC status was not in the Bound state and remained in Pending state.
k get pods -A | grep vsphere-csi-controllerk -n vmware-system-csi logs <pod-name from above command> -c vsphere-csi-controller
Errors observed in logs:
failed to add task to listview ServerFaultCode: The session is not authenticated failed to monitor task for volume pvc-xxxxx rpc error: code = Internal desc = failed to monitor task
spark pods were stuck in Pending state because Persistent Volume Claims (PVCs) were not getting bound successfully through the vSphere CSI driver.
SSP all Environments
The vSphere CSI controller lost authentication with vCenter due to an expired or invalid vCenter session.
During PVC provisioning:
Task monitoring failed with:
ServerFaultCode: The session is not authenticatedPossible triggers include:
Because the CSI controller could not monitor the provisioning task, PVC creation failed and dependent pods remained pending.
Restart the vSphere CSI controller pods to re-establish authenticated sessions with vCenter.
k rollout restart deployment vsphere-csi-controller -n vmware-system-csi
Or restart the CSI controller pods manually:
k get pods -A | grep csi-controller
k delete pod -n vmware-system-csi <csi-controller-pod-name received from above command outout>
After restart:
Verify PVC status and it should be in Bound state
k get pvc -A
Verify pods and pod status should be at Running state
kubectl get pods -A | grep spark
Check CSI logs to ensure authentication errors are no longer present:
kubectl logs -n vmware-system-csi <csi-controller-pod>
if still issue persists , please contact Broadcom support for further troubleshooting