Velero backup stuck in pending state due to PVC annotations preventing restore when CSI driver is different on destination Cluster
search cancel

Velero backup stuck in pending state due to PVC annotations preventing restore when CSI driver is different on destination Cluster

book

Article ID: 389554

calendar_today

Updated On:

Products

Tanzu Mission Control

Issue/Introduction

When restoring a cluster from backup in TMC using Velero; the cluster gets hung in a pending state. Some of the errors you would see when doing so are: 

  1. You will see the PVC in a pending state
    • # kubectl get pvc -n <namespace>
      NAME         STATUS      VOLUME                              CAPACITY   ACCESS MODES   STORAGECLASS   AGE
      <pvc_name>   Pending    pvc-########-####-####-####-##########   20Gi        RWO            default        36m
  2. When looking at the yaml we see in the annotations that the CSI driver is csi.vsphere.vmware.com rather than the CSI driver we are expecting
    • # kubectl get pvc -n <namespace> <pvc_name> -o yaml 
      metadata:
      annotations:
          backup.velero.io/must-include-additional-items: "true"
          meta.helm.sh/release-name: ##############
          meta.helm.sh/release-namespace: <namespace>
          volume.beta.kubernetes.io/storage-provisioner: csi.vsphere.vmware.com
          volume.kubernetes.io/storage-provisioner: csi.vsphere.vmware.com
          volumehealth.storage.kubernetes.io/health: accessible
          volumehealth.storage.kubernetes.io/health-timestamp: Wed Dec 18 01:18:17 UTC 2024
  3. We see that pods are stuck in pending with the error similar to:
    •  0/13 nodes are available: pod has unbound immediate PersistentVolumeClaims, preemption: 0/13 nodes are available: 13 Preemption is not helpful for scheduling.

Environment

Environment using TMC Velero restore with CSI drivers that are not vsphere CSI drivers

Cause

When using a fs-backup we see the following annonations are being restored to the new cluster preventing Kubernetes from properly scheduling the PVC:


volume.beta.kubernetes.io/storage-provisioner: csi.vsphere.vmware.com
volume.kubernetes.io/storage-provisioner: csi.vsphere.vmware.com

 

 

Resolution

This fix will be released with 1.16 Velero.

For now the workaround is to:

  1. Copy the current pvc, back it up, and edit it to remove the annotations and any other details
    • Sample:
      apiVersion: v1
      kind: PersistentVolumeClaim 
      metadata:
      	annotations:
      	   meta.helm.sh/release-name: <pvc_name>
      	   meta.helm.sh/release-namespace:<namespace>
      	labels:
      	   app.kubernetes.io/component: <name>
      	   app.kubernetes.io/instance: <pvc_name>
      	   app.kubernetes.io/managed-by: Helm 
      	   app.kubernetes. io/name: helm
      	   helm. sh/chart: helm-5.4.1
      	   jaas_controller: <pvc_name>
      	   name: <pvc_name>
      	namespace: <namespace>
      spec:
      	accessModes:
      	- ReadWriteOnce
      	 resources:
      	    requests:
      	       storage: 20Gi
  2. Delete current pvc 
  3. Create newly made pvc with the details and annotations removed

After doing this the restore should continue without issue. 

 

If you have any issues with these steps or have any questions please open a case with support.