Backup operations to protect Persistent Volume Claims associated with Guest cluster failing intermittently.
A delay occurs when Commvault attempts to provision a temporary worker pod and Persistent Volume Claim to read data from a snapshot volume.
kubectl get pvc -n <namespace>NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE######## - #### - #### - ############ Bound pvc- ######## - #### - #### - ############ 8Gi RWO storage-class 10d######## - #### - #### - ############ Pending pvc- ######## - #### - #### - ############ 10Gi RWO storage-class 3m52s
kubectl get pods -n <namespace> NAME READY STATUS RESTARTS AGE IP NODE ######## - #### - #### - ############ 0/1 Pending 0 3m52s <none> ######## - #### - #### - ############ ######## - #### - #### - ############ 1/1 Running 0 45h <none> ######## - #### - #### - ############ ######## - #### - #### - ############ 1/1 Running 0 45h <none> ######## - #### - #### - ############
/var/log/pods/vmware-system-csi_vsphere-csi-controller-########-########/csi-provisioner/ 0.log
YYYY-MM-DDTHH:MM:SS.672945273Z stderr F E1228 HH:MM:SS controller.go:957] error syncing claim "######## - #### - #### - ############ ": failed to provision volume with StorageClass "<Storage-class-name>": rpc error: code = Internal desc = failed to create volume on namespace: <namespace> in supervisor cluster. Error: persistentVolumeClaim ######## - #### - #### - ############ in namespace <namespace> not in phase Bound within 240 seconds. reason: failed to provision volume with StorageClass "Storage-class-name": rpc error: code = Internal desc = failed to create volume. Error: failed to get the compatible datastore for create volume from snapshot ######## - #### - #### - ############ with error: <nil>
Commvault's backup log:
1627535 18d7f4 MM/YY HH:MM:SS 74286 CK8sInfo::OpenVmdk() - Failed to create worker [<worker-name>] for app [### PersistentVolumeClaim ######## - #### - #### - ############ ].1627535 18d7f4 MM/YY HH:MM:SS 74286 CK8sInfo::SetLastVMErrorFromQiError() - Setting Last VM Error: [329] Error: [0xEDDD0149:{K8sApp::CreateTARWorker(3514)/Int.329.0x149-Error creating worker pod. [Success] in namespace [netbox] details:[Events: 1) Pod:<pod-name>: FailedScheduling: running PreBind plugin "VolumeBinding": binding volumes: pod does not exist any more: pod "<pod-name>" not found. ]}]1627535 18d7f4 MM/YY HH:MM:SS 74286 VSBkpWorker::BackupVMFileCollection() - Failed to open file collection object.
Backup fails because snapshot based temporary PVCs using a WaitForFirstConsumer (late-binding) StorageClass do not bind within the expected time window (e.g., 240 seconds), resulting in volume binding or pod scheduling errors.
To Validate the Guest cluster storage class volumebinding we can execute below command:
kubectl get sc
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE
test-sc csi.vsphere.vmware.com Retain Immediate
vsan-default-storage-policy-latebinding csi.vsphere.vmware.com Delete WaitForFirstConsumer
Kindly reach out to backup vendor for further assistance to fix the reported issue.