Pods get stuck in ContainerCreating status with the following events:
Warning FailedMount 1s kubelet MountVolume.SetUp failed for volume "pvc-<>" : rpc error: code = FailedPrecondition desc = volume ID: "<volume-id>" does not appear staged to "/var/lib/kubelet/plugins/kubernetes.io/csi/csi.vsphere.vmware.com/<>/globalmount"
The issue was observed with PVs managed by a csi.vsphere.vmware.com StorageClass with reclaimPolicy set to Retain and volumeBindingMode set to Immediate:allowVolumeExpansion: true
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: retain-sc
provisioner: csi.vsphere.vmware.com
reclaimPolicy: Retain
volumeBindingMode: Immediate
Something wrong may have occurred with the CSI components when attaching the volume to the nodes where the pods are scheduled.
If a VolumeAttachment object exists but no respective volume mount exists in the node, kubelet will fail to mount the volume into the pods and CSI won't try to mount the volume in the node as it may think it's already mounted due to the existence of the VolumeAttachment object.
# kubectl get pv,pvc,volumeattachment -n <namespace>
$ kubectl get pv,pvc,volumeattachment
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
persistentvolume/pvc-f900d7c2-a318-4bbd-80c8-3798ecab1b70 1Gi RWO Retain Bound default/vsphere-csi-pvc retain-sc 2d3h
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
persistentvolumeclaim/vsphere-csi-pvc Bound pvc-f900d7c2-a318-4bbd-80c8-3798ecab1b70 1Gi RWO retain-sc 2d
NAME ATTACHER PV NODE ATTACHED AGE
volumeattachment.storage.k8s.io/csi-735edc96e352528f89c7344b45a422fdb4b568c6d4a43368320b84351bae4e1a csi.vsphere.vmware.com pvc-f900d7c2-a318-4bbd-80c8-3798ecab1b70 workload-<> true 3m53s
# kubectl get po -n vmware-system-csi
# df -h | grep <pv-name>
"/var/lib/kubelet/pods/<pod-uid>/volumes/kubernetes.io~csi/"
.<pod-uid>
can be obtained by describing the pod (kubectl describe pod
) and checking the UUID."/var/lib/kubelet/pods/<pod-uid>/volumes/kubernetes.io~csi/"
is empty, that may also indicate an issue.# df -h | grep pvc-f900d7c2-a318-4bbd-80c8-3798ecab1b70
#
# ls -lrt /var/lib/kubelet/pods/b1e6c69f-745d-46ab-8b2c-50d55982032f/volumes/kubernetes.io~csi/
total 0
# journalctl -u kubelet | grep <pod-name>
# kubectl get po -A -o jsonpath='{range .items[]}{.metadata.namespace}{"\t"}{.metadata.name}{"\t"}{.spec.volumes[].persistentVolumeClaim.claimName}{"\n"}{end}' | grep <pvc-name> | awk '{print $1 " " $2}'
Possible resolution steps include:
# kubectl scale deploy <deployment-name> -n <namespace> --replicas=0
# kubectl scale sts <statefulset-name> -n <namespace> --replicas=0
# kubectl get pv,pvc,volumeattachment -n <namespace>
# kubectl get pv,pvc,volumeattachment
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
persistentvolume/pvc-f900d7c2-a318-4bbd-80c8-3798ecab1b70 1Gi RWO Retain Bound default/vsphere-csi-pvc retain-sc 2d3h
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
persistentvolumeclaim/vsphere-csi-pvc Bound pvc-f900d7c2-a318-4bbd-80c8-3798ecab1b70 1Gi RWO retain-sc 2d
# kubectl delete volumeattachment <volumeattachment-name>
# kubectl scale deploy <deployment-name> -n <namespace> --replicas=<original-number-of-replicas>
# kubectl scale sts <statefulset-name> -n <namespace> --replicas=<original-number-of-replicas>
# kubectl get po,pv,pvc,volumeattachment
NAME READY STATUS RESTARTS AGE
pod/pvc-test-deployment-6865f74d8-kg65b 1/1 Running 0 11m
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
persistentvolume/pvc-f900d7c2-a318-4bbd-80c8-3798ecab1b70 1Gi RWO Retain Bound default/vsphere-csi-pvc retain-sc 2d4h
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
persistentvolumeclaim/vsphere-csi-pvc Bound pvc-f900d7c2-a318-4bbd-80c8-3798ecab1b70 1Gi RWO retain-sc 2d
NAME ATTACHER PV NODE ATTACHED AGE
volumeattachment.storage.k8s.io/csi-735edc96e352528f89c7344b45a422fdb4b568c6d4a43368320b84351bae4e1a csi.vsphere.vmware.com pvc-f900d7c2-a318-4bbd-80c8-3798ecab1b70 workload-<> true 11m