vsphere-csi-node pods are stuck in a CrashLoopBackOff state with constant restarts
search cancel

vsphere-csi-node pods are stuck in a CrashLoopBackOff state with constant restarts

book

Article ID: 405476

calendar_today

Updated On:

Products

VMware Telco Cloud Automation

Issue/Introduction

  • After the cluster upgrade, the vsphere-csi-node pods enter a CrashLoopBackOff state.
  • Upgraded Cluster from 1.26.8 to 1.26.14
  • node-driver-registrar container error:
    ExecSync cmd from runtime service failed" err="rpc error: code = DeadlineExceeded desc = failed to exec in container: timeout 1s exceeded: context deadline exceeded"

Environment

3.2

Cause

The pod fails to start because the configured livenessProbe parameters (initialDelaySeconds and timeoutSeconds) are too short, causing the health check to time out before the container is fully ready.

Resolution

Modify csi-node-driver-registrar liveness probe with initialDelaySeconds and timeoutSeconds

livenessProbe:
  exec:
    command:
      - /csi-node-driver-registrar
      - --kubelet-registration-path=/var/lib/kubelet/plugins/csi.vsphere.vmware.com/csi.sock
      - --mode=kubelet-registration-probe
  failureThreshold: 3
  initialDelaySeconds: 10
  periodSeconds: 10
  successThreshold: 1
  timeoutSeconds: 10
name: node-driver-registrar

Additional Information

https://kubernetes.io/docs/concepts/configuration/liveness-readiness-startup-probes/