Spark Pods Stuck in Pending State Due to PVC Provisioning Failure in vSphere CSI
search cancel

Spark Pods Stuck in Pending State Due to PVC Provisioning Failure in vSphere CSI

book

Article ID: 440231

calendar_today

Updated On:

Products

VMware vDefend Firewall VMware vDefend Firewall with Advanced Threat Prevention

Issue/Introduction

The Spark pods were stuck in the Pending state because the Persistent Volume Claims (PVCs) were not getting bound successfully through the vSphere CSI driver.

The following errors were observed in CSI controller pod logs: to check logs login to SSPI using sysadmin if SSPI version is > 5.0  and  root credentials 

To check the logs:

  • Login to SSPI cli using the sysadmin account if the SSPI version is greater than 5.0

  • Login to SSPI cli using the root account if the SSPI version is 5.0

Commands used for troubleshooting:

k get pods -A | grep spark

Output:

Spark pods were not in the Running state.
k get pvc -A

Observation:

PVC status was not in the Bound state and remained in Pending state.
k get pods -A | grep vsphere-csi-controller
k -n vmware-system-csi logs <pod-name from above command>  -c vsphere-csi-controller

Errors observed in logs:

failed to add task to listview
ServerFaultCode: The session is not authenticated
failed to monitor task for volume pvc-xxxxx
rpc error: code = Internal desc = failed to monitor task

 

spark pods were stuck in Pending  state because Persistent Volume Claims (PVCs) were not getting bound successfully through the vSphere CSI driver.

 

 

Environment

SSP all Environments 

Cause

The vSphere CSI controller lost authentication with vCenter due to an expired or invalid vCenter session.

During PVC provisioning:

  1. CSI initiated the volume creation task in vCenter.
  2. CSI attempted to monitor the task using vSphere ListView APIs.
  3. The existing vCenter session/token used by CSI was no longer valid.
  4. Task monitoring failed with:

    ServerFaultCode: The session is not authenticated

Possible triggers include:

  • vCenter service restart or failover
  • SSO/session timeout expiration
  • Network interruption between CSI and vCenter
  • Stale CSI authentication sessions
  • Older CSI/govmomi session recovery limitations

Because the CSI controller could not monitor the provisioning task, PVC creation failed and dependent pods remained pending.

Resolution

Restart the vSphere CSI controller pods to re-establish authenticated sessions with vCenter.

k rollout restart deployment vsphere-csi-controller -n vmware-system-csi

Or restart the CSI controller pods manually:

k get pods -A | grep csi-controller

k delete pod -n vmware-system-csi <csi-controller-pod-name received from above command outout>

After restart:

  • CSI establishes a fresh authenticated session with vCenter
  • PVC provisioning retries succeed
  • PVs get bound successfully
  • Application pods move to Running state

Validation

Verify PVC status and it should be in Bound state 

k get pvc -A

Verify pods and pod status should be at Running state 

kubectl get pods -A | grep spark

Check CSI logs to ensure authentication errors are no longer present:

kubectl logs -n vmware-system-csi <csi-controller-pod>

Additional Information

if still issue persists , please contact Broadcom support for further troubleshooting