Pods stuck in ContainerCreating status with Volume "does not appear staged" error
search cancel

Pods stuck in ContainerCreating status with Volume "does not appear staged" error

book

Article ID: 378751

calendar_today

Updated On:

Products

Tanzu Kubernetes Runtime Tanzu Kubernetes Grid VMware Tanzu Kubernetes Grid VMware Tanzu Kubernetes Grid 1.x VMware Tanzu Kubernetes Grid Management VMware Tanzu Kubernetes Grid Plus VMware Tanzu Kubernetes Grid Plus 1.x

Issue/Introduction

Pods get stuck in ContainerCreating status with the following events:

Warning  FailedMount  1s    kubelet            MountVolume.SetUp failed for volume "pvc-<>" : rpc error: code = FailedPrecondition desc = volume ID: "<volume-id>" does not appear staged to "/var/lib/kubelet/plugins/kubernetes.io/csi/csi.vsphere.vmware.com/<>/globalmount"

Environment

The issue was observed with PVs managed by a csi.vsphere.vmware.com StorageClass with reclaimPolicy set to Retain and volumeBindingMode set to Immediate:

allowVolumeExpansion: true
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: retain-sc
provisioner: csi.vsphere.vmware.com
reclaimPolicy: Retain
volumeBindingMode: Immediate

Cause

Something wrong may have occurred with the CSI components when attaching the volume to the nodes where the pods are scheduled.

If a VolumeAttachment object exists but no respective volume mount exists in the node, kubelet will fail to mount the volume into the pods and CSI won't try to mount the volume in the node as it may think it's already mounted due to the existence of the VolumeAttachment object.

Resolution

General troubleshooting

  • Check PV, PVC and VolumeAttachment objects. The VolumeAttachment object is created automatically by CSI once a Pod mounting the PVC is created. When all Pods mounting the PVC are deleted, the VolumeAttachment object is also deleted automatically.

    # kubectl get pv,pvc,volumeattachment -n <namespace>

    For example:

    $ kubectl get pv,pvc,volumeattachment
    NAME                                                        CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                     STORAGECLASS   REASON   AGE
    persistentvolume/pvc-f900d7c2-a318-4bbd-80c8-3798ecab1b70   1Gi        RWO            Retain           Bound    default/vsphere-csi-pvc   retain-sc               2d3h

    NAME                                    STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
    persistentvolumeclaim/vsphere-csi-pvc   Bound    pvc-f900d7c2-a318-4bbd-80c8-3798ecab1b70   1Gi        RWO            retain-sc      2d

    NAME                                                                                                   ATTACHER                 PV                                         NODE                                             ATTACHED   AGE
    volumeattachment.storage.k8s.io/csi-735edc96e352528f89c7344b45a422fdb4b568c6d4a43368320b84351bae4e1a   csi.vsphere.vmware.com   pvc-f900d7c2-a318-4bbd-80c8-3798ecab1b70   workload-<>   true       3m53s


  • Check CSI pods are up and running:

    # kubectl get po -n vmware-system-csi

  • Log into the nodes where the pods are scheduled and check volume mounts there:

    # df -h | grep <pv-name>

    Check the PV is correctly mounted in "/var/lib/kubelet/pods/<pod-uid>/volumes/kubernetes.io~csi/".
    The <pod-uid> can be obtained by describing the pod (kubectl describe pod) and checking the UUID.

    If the VolumeAttachment object exists but there's no volume mount in the node, that may indicate an issue.
    Additionally, if "/var/lib/kubelet/pods/<pod-uid>/volumes/kubernetes.io~csi/" is empty, that may also indicate an issue.

    For example, grepping the above PV's name doesn't return anything despite the existence of an associated VolumeAttachment object, and the directory is empty:

    # df -h | grep pvc-f900d7c2-a318-4bbd-80c8-3798ecab1b70

    # ls -lrt /var/lib/kubelet/pods/b1e6c69f-745d-46ab-8b2c-50d55982032f/volumes/kubernetes.io~csi/
    total 0

  • Check kubelet logs in the node. We may see errors trying to mount the volume into the Pod, similar to the ones in the Pod's events.

    # journalctl -u kubelet | grep <pod-name>

  • Check if there're more Pods mounting the PVC:

    # kubectl get po -A -o jsonpath='{range .items[]}{.metadata.namespace}{"\t"}{.metadata.name}{"\t"}{.spec.volumes[].persistentVolumeClaim.claimName}{"\n"}{end}' | grep <pvc-name> | awk '{print $1 " " $2}'


Resolution

Possible resolution steps include:

  • Scale in to 0 replicas all the Deployments/StatefulSets mounting the problematic PVC, as found in the previous step:

    # kubectl scale deploy <deployment-name> -n <namespace> --replicas=0
    # kubectl scale sts <statefulset-name> -n <namespace> --replicas=0

  • Check again the VolumeAttachment object. The one associated to the PVC should be automatically deleted in a few seconds.
    The PV/PVC objects should remain in Bound status.

    # kubectl get pv,pvc,volumeattachment -n <namespace>


    For example:

    # kubectl get pv,pvc,volumeattachment
    NAME                                                        CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                     STORAGECLASS   REASON   AGE
    persistentvolume/pvc-f900d7c2-a318-4bbd-80c8-3798ecab1b70   1Gi        RWO            Retain           Bound    default/vsphere-csi-pvc   retain-sc               2d3h

    NAME                                    STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
    persistentvolumeclaim/vsphere-csi-pvc   Bound    pvc-f900d7c2-a318-4bbd-80c8-3798ecab1b70   1Gi        RWO            retain-sc      2d


  • If the VolumeAttachment object doesn't get deleted automatically after a couple of minutes, delete it manually:

    # kubectl delete volumeattachment <volumeattachment-name>

    Make sure it doesn't get recreated. If it gets recreated, most likely there's still some Pod mounting the PVC. Look for it and scale down the associated Deployment/StatefulSet.

  • Scale back out the Deployments/StatefulSets:

    # kubectl scale deploy <deployment-name> -n <namespace> --replicas=<original-number-of-replicas>
    # kubectl scale sts <statefulset-name> -n <namespace> --replicas=<original-number-of-replicas>

    We should see now the Pods going into Running status and a new VolumeAttachment object being created.


    For example:

    # kubectl get po,pv,pvc,volumeattachment
    NAME                                      READY   STATUS    RESTARTS   AGE
    pod/pvc-test-deployment-6865f74d8-kg65b   1/1     Running   0          11m

    NAME                                                        CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                     STORAGECLASS   REASON   AGE
    persistentvolume/pvc-f900d7c2-a318-4bbd-80c8-3798ecab1b70   1Gi        RWO            Retain           Bound    default/vsphere-csi-pvc   retain-sc               2d4h

    NAME                                    STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
    persistentvolumeclaim/vsphere-csi-pvc   Bound    pvc-f900d7c2-a318-4bbd-80c8-3798ecab1b70   1Gi        RWO            retain-sc      2d

    NAME                                                                                                   ATTACHER                 PV                                         NODE                                             ATTACHED   AGE
    volumeattachment.storage.k8s.io/csi-735edc96e352528f89c7344b45a422fdb4b568c6d4a43368320b84351bae4e1a   csi.vsphere.vmware.com   pvc-f900d7c2-a318-4bbd-80c8-3798ecab1b70   workload-<>   true       11m