Failed to expand volume because the disk that backs it has snapshots
search cancel

Failed to expand volume because the disk that backs it has snapshots

book

Article ID: 313099

calendar_today

Updated On:

Products

VMware vSphere ESXi VMware vSphere with Tanzu

Issue/Introduction

This KB should help to know how to manually delete the orphaned snapshots that are attached to an FCD.

 

Symptoms:

- Volume expansion on a guest cluster fails.
7s (x601 over 41h)  Warning  VolumeResizeFailed  PersistentVolumeClaim/data-postgresql-0  resize volume "pvc-3b0cf013-2ec9-4177-924c-f71ce25f91a6" by resizer "csi.vsphere.vmware.com" failed: rpc error: code = Internal desc = failed to expand volume b83ce058-e082-4cdb-bfc3-6219bda0fdc4-3b0cf013-2ec9-4177-924c-f71ce25f91a6 in namespace prodns of supervisor cluster. Error: supervisor persistentVolumeClaim b83ce058-e082-4cdb-bfc3-6219bda0fdc4-3b0cf013-2ec9-4177-924c-f71ce25f91a6 in namespace prodns not in "FileSystemResizePending" condition within 240 seconds

 

- Logs from csi-attacher using "kubectl logs csi-controller-<pod> -n vmware-system-csi -c csi-attacher" : 

{"level":"error","time":"yyyy-mm-ddThh:mm:ssZ","caller":"wcpguest/controller.go:1237","msg":"failed to update supervisor PVC \"<pvc-id>\" in \"<ns>\" namespace. Error: admission webhook \"validation.csi.vsphere.vmware.com\" denied the request: Expanding volume with snapshots is not allowed","TraceId":"<trace_id>","stacktrace":"sigs.k8s.io/vsphere-csi-driver/v3/pkg/csi/service/wcpguest.

 

- The disk that backs a volume has snapshots that have been taken by a snapshot-based backup solution.
$ govc disk.snapshot.ls -k -dc="Datacenter-1" -ds="Tanzu-vmfssan" -l "008d3b91-1756-49cd-9db3-fbf4699887fd"
9fc48a39-3be2-4183-b645-ebbf4efb2467 kanister.fcd.description:6c254147-66e6-11ee-af2f-e6b8f1567dd3 MM DD HH:MM:SS
4b81583d-ccea-44e2-a577-4305e82ad02a kanister.fcd.description:e7634189-5fd3-11ee-af2f-e6b8f1567dd3 MM DD HH:MM:SS

 

Environment

VMware vSphere 7.0 with Tanzu

Cause

If a virtual disk that backs a volume has snapshots, it cannot be resized.

Resolution

Get the Volume ID of the problematic volume. Check the first point in the resolution section of KB 305322.
Note: The below steps are also valid for TKGi/TKGm. To get the Volume ID, get the volume in JSON format and identify the VolumeHandle.
$ kubectl get pv pvc-f865d13c-c0c4-46fc-828b-aeebc4a649fe -o json | jq .spec.csi.volumeHandle

For linux : 

    • Download & extract the govc binary on a Linux server which can access the VC.

$ wget https://github.com/vmware/govmomi/releases/download/v0.32.0/govc_Linux_x86_64.tar.gz
$ tar -zxf govc_Linux_x86_64.tar.gz

Note: you can get the latest govc release from the following page:
https://github.com/vmware/govmomi/releases

    • Move the govc binary to the user directory.

$ sudo mv govc /usr/local/bin/

    • Validate the govc tool installation.

$ which govc
$ govc version

    • Define Env variables to connect to VC.

$ export GOVC_URL=<vCenter_FQDN>
$ export GOVC_USERNAME=<[email protected]>
$ export GOVC_PASSWORD=<administrator_password>
$ export GOVC_INSECURE=true

    • List all the snapshots with the volume ID that you specified in step 1.

$ govc disk.snapshot.ls -k -dc=<datacenter-name> -ds=<datastore-name> -l <volume-id>
Example:
$ govc disk.snapshot.ls -k -dc="Datacenter-1" -ds="Tanzu-vmfssan" -l "008d3b91-1756-49cd-9db3-fbf4699887fd"
9fc48a39-3be2-4183-b645-ebbf4efb2467 kanister.fcd.description:6c254147-66e6-11ee-af2f-e6b8f1567dd3 MM DD HH:MM:SS
4b81583d-ccea-44e2-a577-4305e82ad02a kanister.fcd.description:e7634189-5fd3-11ee-af2f-e6b8f1567dd3 MM DD HH:MM:SS

    • Delete the snapshots. It is not required to follow any sequence while deleting them. You can delete the snapshots one after another in any random sequence.

$ govc disk.snapshot.rm -dc <datacenter-name> -ds <datastore-name> <volume-id>  <snapshot-name>
Example:
$ govc disk.snapshot.rm -dc="Datacenter-1" -ds="Tanzu-vmfssan" -l "008d3b91-1756-49cd-9db3-fbf4699887fd" "9fc48a39-3be2-4183-b645-ebbf4efb2467"
[DD-MM-YY HH:MM:SS] Deleting 9fc48a39-3be2-4183-b645-ebbf4efb2467...OK 

 

For windows : 

    • Download and extract the govc_.exe file from : https://github.com/vmware/govmomi/releases
    • Launch PowerShell and navigate to the extracted folder
    • List all the snapshots with the volume ID that you specified in step 1.

$ .\govc.exe disk.snapshot.ls -k -u user:password@host -dc=<datacenter-name> -ds=<datastore-name> -l <volume-id>

-k : Skip verification of server certificate

-u : vCenter or ESXi URL to be specified with username and password to be used for connection

-dc : Datacenter name from the vCenter inventory

                    -ds : Name of the datastore in use

    • Example :

.\govc.exe disk.snapshot.ls -k -u [email protected]:*********@vCenter-FQDN -dc='example-datacenter' -ds='example-datastore-name-1' -l 'volume-id-from step-1' 

        • 3c5394ca-4592-4384-ae7f-a162fce93fb8  kanister.fcd.description:83cc501d-8692-11ef-bbe9-8657f69a7312  MM DD HH:MM:SS
          b32a6db4-5d51-4cc4-bf91-1fa2d4981310  kanister.fcd.description:2b1dd0ce-0033-11ef-aa09-0a5c44916c4c  MM DD HH:MM:SS
    • Delete the snapshots. It is not required to follow any sequence while deleting them. You can delete the snapshots one after another in any random sequence.

$ .\govc.exe disk.snapshot.rm -k -u [email protected]:***********@vCenter-FQDN -dc='example-datacenter' -ds='datastore-name-01' 3c5394ca-4592-4384-ae7f-a162fce93fb8 

[DD-MM-YY HH:MM:SS] Deleting 3c5394ca-4592-4384-ae7f-a162fce93fb8...OK%)

Additional Information

- Deleting multiple snapshots can be time consuming depending on the number of snapshots, especially when you have multiple disks with snapshots. One way to do a bulk snapshot delete is to use this one-liner:

$ govc disk.snapshot.ls -dc="<datacenter-name>" -ds="<datastore-name>" "<cns-volume-uuid>" | awk '{print $1}' | while read snapShot ; do govc disk.snapshot.rm -dc "<datacenter-name>" -ds "<datastore-name>" "<cns-volume-uuid>" $snapShot ; done

For example:
$ govc disk.snapshot.ls -dc="Datacenter-1" -ds="Tanzu_vmfssan" "2653de7a-bf94-49d8-9d88-0351f3ccb56b" | awk '{print $1}' | while read snapShot ; do govc disk.snapshot.rm -dc "Datacenter-1" -ds "Tanzu_vmfssan" "2653de7a-bf94-49d8-9d88-0351f3ccb56b" $snapShot ; done
[18-10-23 09:28:12] Deleting 9fc48a39-3be2-4183-b645-ebbf4efb2467...OK
[18-10-23 09:45:33] Deleting 8f8df1ab-f89e-4407-8b16-0025c18b7cb8...OK


Impact/Risks:

Deleting the snapshots manually from the vSphere side while they haven't been deleted from the backup solution side may cause discrepancies in the backup solution database. Thus, the customer must confirm that those snapshots are orphaned and have already been deleted from the backup solution side. If he needs help to confirm, he should engage the backup solution vendor.