Pod stuck in ContainerCreating state with error message that MountVolume.Setup failed for its associated persistentvolumeclaim (pvc).
While connected to the cluster context where the pod is trying to run, the following symptoms are present:
-Performing a describe on the ContainerCreating pod returns an error message similar to the below where the pvc and volume ID will vary based on environment:
Warning FailedMount ##s (x# over ##s) kubelet MountVolume.SetUp failed for volume "pvc-i-am-an-example-pvc" : rpc error: code = FailedPrecondition desc = volume ID: "this-is-an-example-volume-id-string" does not appear staged to "/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-i-am-an-example-pvc/globalmount"
-Describing the volumeattachment associated with the above noted pvc shows that it is attached True
-"kubectl get volumeattachment -A | grep <pvc>" can be used to locate the volumeattachment
-The vsphere-csi-controller pods are healthy in Running state on both the Supervisor and affected cluster
-The vsphere-csi-node pod on the same worker node as the pod in ContainerCreating state shows the same "does not appear to be staged" error message for the same pvc
-"kubectl get pods <pod name> -n <pod namespace> -o wide" can be used to find the worker node that this ContainerCreating pod is running on
While directly connected (SSH) to the worker node where the pod is trying to run, the following symptoms are present:
-Performing a ls on the globalmount path from the above "does not appear staged" error message returns an "Input/Output error"
-On a node with a healthy filesystem, the noted globalmount path contains directories for containers of pods on the node.
vSphere with Tanzu 7.0
vSphere with Tanzu 8.0
This can occur on a vSphere Kubernetes cluster regardless of whether or not it is managed by Tanzu Mission Control (TMC)
This is indicative of a filesystem issue on the worker node that the pod in ContainerCreating state is attempting to start on.
The system is unable to properly stage the globalmount directory for the pod's container(s) due to this filesystem issue.
This globalmount directory staging action is necessary for volume attachment and mount setup when starting up a pod.
The pod in ContainerCreating state will need to be moved to another worker node with a healthy filesystem.
While connected to the cluster context where the pod is trying to run:
kubectl get pod <pod name> -n <pod namespace> -o wide
kubectl get nodes
kubectl cordon <node name that the ContainerCreating pod is running on>
kubectl get nodes
kubectl delete pod <pod name> -n <pod namespace>
kubectl get pod <pod name> -n <pod namespace> -o wide
kubectl describe pod <pod name> -n <pod namespace>
kubectl get volumeattachment | grep <pvc associated with the pod>
kubectl describe volumeattachment <volumeattachment name>