Pod Stuck in ContainerCreating state on TKGm
search cancel

Pod Stuck in ContainerCreating state on TKGm

book

Article ID: 436683

calendar_today

Updated On:

Products

VMware Tanzu Kubernetes Grid Management

Issue/Introduction

  • Pod remains stuck in the ContainerCreating state. The Kubernetes scheduler successfully assigns the Pod to a healthy Node, but the vSphere CSI driver fails to mount the Persistent Volume (PV) to the underlying Virtual Machine.
  • kubectl describe pod <pod-name> gives below events.

    Warning FailedAttachVolume AttachVolume.Attach failed for volume "pvc-xxxx" : rpc error: code = Internal desc = failed to attach disk: "<volume-id>" with node: "<node-id>" err ServerFaultCode: CNS: Failed to retrieve datastore for vol <volume-id>. (vim.fault.NotFound)

Environment

TKGm 2.x

Cause

The (vim.fault.NotFound) error explicitly indicates that VMware vCenter cannot locate the virtual disk (VMDK file) backing the Persistent Volume.
This occurs when a Kubernetes Node (Virtual Machine) is manually deleted directly from the vCenter inventory (e.g., using the "Delete from Disk" action), and the attached CSI volume is destroyed alongside the VM's OS disk.
Because the deletion happened outside of Kubernetes, the control plane is unaware that the disk is gone. Kubernetes becomes stuck in a loop attempting to attach a "ghost" disk that no longer exists on the backend datastore.

Resolution

  • Remove the orphaned Kubernetes objects and force the CSI driver to provision a new disk. 

    Note: Take backup of pvc,pv & pod manifests file

  • Follow below steps to remove volume attachment, pvc, pv & pod 
kubectl get volumeattachment | grep pv name
kubectl delete volumeattachment <volume-attachment-name>
kubectl delete pvc <pvc-name>
kubectl delete pv <pv-name>
kubectl delete pod <pod-name>
  • Recreate the pod with manifests file 
 

Additional Information

Never delete Kubernetes worker nodes directly from vCenter using "Delete from Disk" without first gracefully draining the node

kubectl drain <node-name> --ignore-daemonsets --delete-emptydir-data