Application Pods Stuck in Container Creating State After Upgrading Workload Cluster from v1.24.10 to v1.25.13
search cancel

Application Pods Stuck in Container Creating State After Upgrading Workload Cluster from v1.24.10 to v1.25.13

book

Article ID: 377545

calendar_today

Updated On:

Products

Tanzu Kubernetes Grid VMware Tanzu Kubernetes Grid 1.x

Issue/Introduction

 

  • Pods are stuck in the "ContainerCreating" or "CrashLoopBackOff" state.
  • Running kubectl describe pod shows volume attachment issues.
  • Persistent Volume Claims (PVCs) and Persistent Volumes (PVs) are not intact.
  • The environment may have stale entries under volume attachments.

Environment

  • Tanzu Kubernetes Grid (TKG) 2.x

Resolution

  1. Identify Volume Attachment Issues:
    • Use kubectl describe pod to check if the volume is referring to old virtual machines (VMs) that were created before the upgrade.
    • Verify this by examining the logs of the vsphere-csi-controller
      kubectl describe pod <pod-name> -n <namespace>
      kubectl logs <pod-name> -n <namespace>
  2. Restart Relevant Deployments:
    • Restart the vsphere-csi-controller deployment and the vsphere-csi-node daemonset. This will recreate all the pods in the respective namespace.
      kubectl rollout restart deploy <deployname> -n <namespace>
      kubectl rollout restart ds <dsname> -n <namespace>
  3. Verify PVC Status:
    • Check if the PVCs are in the "Bound" state. If not, use kubectl describe pvc to identify any errors or events recorded.
      kubectl describe pvc <pvcname> -n <namespace>”
  4. Check Volume Attachments:
    • Use kubectl describe volumeattachments to verify the status. Note the PVC and the node it is scheduled on is the new node
      kubectl get volumeattachments -A | grep -i false
  1. Identify and Remove Stale volumeattachment Entries:
    • This should remove stale volume attachment entries. If stale entries persist, follow the steps below to clean them.
    • kubectl delete  volumeattachment csi-xxxxxxxxxxxxxxxxxx (if it not deleted by this step, follow the below steps to remove finalizer)
  • kubectl edit volumeattachment csi-xxxxxxxxxxxxxxxxxx
  • Remove the line saying finalizer 
  • Save and exit
  1. Address Persistent Volume Issues:
    • Check for pods in the "ContainerCreating" or "CrashLoopBackOff" state due to PVC issues.
    • Use kubectl describe pod and kubectl describe pvc and review logs of vsphere-csi-controller to understand the issue.
  2. Verify Disk Presence:
    • If the issue persists, confirm the presence of the disk by checking under container volumes using the disk ID. 
  3. Recreate PVC if Necessary:
    • If the VMDK file is missing, as indicated by the error “the object or item could not be found,” the VMDK description file may be absent. This usually occurs if the virtual machine was deleted without fully unmounting the disk.
    • Verify if the issue is not related to kb -- https://knowledge.broadcom.com/external/article/320790
    • If vmdk file cannot be restored, you need to recreate the PVC and clean up the remaining environment.

 

Additional Information