Velero backup stuck in pending state due to PVC annotations preventing restore when CSI driver is different on destination Cluster

search cancel

Velero backup stuck in pending state due to PVC annotations preventing restore when CSI driver is different on destination Cluster

book

Article ID: 389554

calendar_today

Updated On:

Products

Tanzu Mission Control

Issue/Introduction

When restoring a cluster from backup in TMC using Velero; the cluster gets hung in a pending state. Some of the errors you would see when doing so are:

You will see the PVC in a pending state
- # kubectl get pvc -n <namespace>
  NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
  <pvc_name> Pending pvc-########-####-####-####-########## 20Gi RWO default 36m
When looking at the yaml we see in the annotations that the CSI driver is csi.vsphere.vmware.com rather than the CSI driver we are expecting
- # kubectl get pvc -n <namespace> <pvc_name> -o yaml
  metadata:
  annotations:
  backup.velero.io/must-include-additional-items: "true"
  meta.helm.sh/release-name: ##############
  meta.helm.sh/release-namespace: <namespace>
  volume.beta.kubernetes.io/storage-provisioner: csi.vsphere.vmware.com
  volume.kubernetes.io/storage-provisioner: csi.vsphere.vmware.com
  volumehealth.storage.kubernetes.io/health: accessible
  volumehealth.storage.kubernetes.io/health-timestamp: Wed Dec 18 01:18:17 UTC 2024
We see that pods are stuck in pending with the error similar to:
- 0/13 nodes are available: pod has unbound immediate PersistentVolumeClaims, preemption: 0/13 nodes are available: 13 Preemption is not helpful for scheduling.

Environment

Environment using TMC Velero restore with CSI drivers that are not vsphere CSI drivers

Cause

When using a fs-backup we see the following annonations are being restored to the new cluster preventing Kubernetes from properly scheduling the PVC:

volume.beta.kubernetes.io/storage-provisioner: csi.vsphere.vmware.com
volume.kubernetes.io/storage-provisioner: csi.vsphere.vmware.com

Resolution

This fix will be released with 1.16 Velero.

For now the workaround is to:

Copy the current pvc, back it up, and edit it to remove the annotations and any other details

Sample:

apiVersion: v1
kind: PersistentVolumeClaim 
metadata:
	annotations:
	   meta.helm.sh/release-name: <pvc_name>
	   meta.helm.sh/release-namespace:<namespace>
	labels:
	   app.kubernetes.io/component: <name>
	   app.kubernetes.io/instance: <pvc_name>
	   app.kubernetes.io/managed-by: Helm 
	   app.kubernetes. io/name: helm
	   helm. sh/chart: helm-5.4.1
	   jaas_controller: <pvc_name>
	   name: <pvc_name>
	namespace: <namespace>
spec:
	accessModes:
	- ReadWriteOnce
	 resources:
	    requests:
	       storage: 20Gi

Delete current pvc
Create newly made pvc with the details and annotations removed

After doing this the restore should continue without issue.

If you have any issues with these steps or have any questions please open a case with support.

Feedback

thumb_up Yes

thumb_down No