Failed to expand volume because the disk that backs it has snapshots
search cancel

Failed to expand volume because the disk that backs it has snapshots

book

Article ID: 313099

calendar_today

Updated On:

Products

VMware vSphere ESXi VMware vSphere with Tanzu

Issue/Introduction

This KB should help to know how to manually delete the orphaned snapshots that are attached to an FCD.

 


Symptoms:

- Volume expansion on a guest cluster fails.
7s (x601 over 41h)  Warning  VolumeResizeFailed  PersistentVolumeClaim/data-postgresql-0  resize volume "pvc-3b0cf013-2ec9-4177-924c-f71ce25f91a6" by resizer "csi.vsphere.vmware.com" failed: rpc error: code = Internal desc = failed to expand volume b83ce058-e082-4cdb-bfc3-6219bda0fdc4-3b0cf013-2ec9-4177-924c-f71ce25f91a6 in namespace prodns of supervisor cluster. Error: supervisor persistentVolumeClaim b83ce058-e082-4cdb-bfc3-6219bda0fdc4-3b0cf013-2ec9-4177-924c-f71ce25f91a6 in namespace prodns not in "FileSystemResizePending" condition within 240 seconds

- The disk that backs a volume has snapshots that have been taken by a snapshot-based backup solution.
$ govc disk.snapshot.ls -k -dc="Datacenter-1" -ds="Tanzu-vmfssan" -l "008d3b91-1756-49cd-9db3-fbf4699887fd"
9fc48a39-3be2-4183-b645-ebbf4efb2467 kanister.fcd.description:6c254147-66e6-11ee-af2f-e6b8f1567dd3 Oct 9 20:59:26
4b81583d-ccea-44e2-a577-4305e82ad02a kanister.fcd.description:e7634189-5fd3-11ee-af2f-e6b8f1567dd3 Sep 30 20:58:39
06513535-0aa3-433f-a595-4af2ebe85104 kanister.fcd.description:94ba43e2-5e41-11ee-af2f-e6b8f1567dd3 Sep 28 21:00:35
ffba3e15-74c0-4334-b32c-b9f9e6c82d04 kanister.fcd.description:c193d08e-5a53-11ee-af2f-e6b8f1567dd3 Sep 23 21:00:41
dcd33aec-183b-4c9d-8f25-e66511f7e442 kanister.fcd.description:96ea7578-598a-11ee-af2f-e6b8f1567dd3 Sep 22 20:59:24


Environment

VMware vSphere 7.0 with Tanzu

Cause

If a virtual disk that backs a volume has snapshots, it cannot be resized.

Resolution

1- Get the Volume ID of the problematic volume. Check the first point in the resolution section of KB 305322.
Note: The below steps are also valid for TKGi/TKGm. To get the Volume ID, get the volume in JSON format and identify the VolumeHandle.
$ kubectl get pv pvc-f865d13c-c0c4-46fc-828b-aeebc4a649fe -o json | jq .spec.csi.volumeHandle

2- Download & extract the govc binary on a Linux server which can access the VC.
$ wget https://github.com/vmware/govmomi/releases/download/v0.32.0/govc_Linux_x86_64.tar.gz
$ tar -zxf govc_Linux_x86_64.tar.gz

Note: you can get the latest govc release from the following page:
https://github.com/vmware/govmomi/releases

3- Move the govc binary to the user directory.
$ sudo mv govc /usr/local/bin/

4- Validate the govc tool installation.
$ which govc
$ govc version

5- Define Env variables to connect to VC.
$ export GOVC_URL=<vCenter_FQDN>
$ export GOVC_USERNAME=<[email protected]>
$ export GOVC_PASSWORD=<administrator_password>
$ export GOVC_INSECURE=true

6- List all the snapshots with the volume ID that you specified in step 1.
$ govc disk.snapshot.ls -k -dc=<datacenter-name> -ds=<datastore-name> -l <volume-id>
Example:
$ govc disk.snapshot.ls -k -dc="Datacenter-1" -ds="Tanzu-vmfssan" -l "008d3b91-1756-49cd-9db3-fbf4699887fd"
9fc48a39-3be2-4183-b645-ebbf4efb2467 kanister.fcd.description:6c254147-66e6-11ee-af2f-e6b8f1567dd3 Oct 9 20:59:26
4b81583d-ccea-44e2-a577-4305e82ad02a kanister.fcd.description:e7634189-5fd3-11ee-af2f-e6b8f1567dd3 Sep 30 20:58:39
06513535-0aa3-433f-a595-4af2ebe85104 kanister.fcd.description:94ba43e2-5e41-11ee-af2f-e6b8f1567dd3 Sep 28 21:00:35
ffba3e15-74c0-4334-b32c-b9f9e6c82d04 kanister.fcd.description:c193d08e-5a53-11ee-af2f-e6b8f1567dd3 Sep 23 21:00:41
dcd33aec-183b-4c9d-8f25-e66511f7e442 kanister.fcd.description:96ea7578-598a-11ee-af2f-e6b8f1567dd3 Sep 22 20:59:24

7- Delete the snapshots. It is not required to follow any sequence while deleting them. You can delete the snapshots one after another in any random sequence.
$ govc disk.snapshot.rm -dc <datacenter-name> -ds <datastore-name> <volume-id>  <snapshot-name>
Example:
$ govc disk.snapshot.rm -dc="Datacenter-1" -ds="Tanzu-vmfssan" -l "008d3b91-1756-49cd-9db3-fbf4699887fd" "9fc48a39-3be2-4183-b645-ebbf4efb2467"
[18-10-23 09:28:12] Deleting 9fc48a39-3be2-4183-b645-ebbf4efb2467...OK

 

 

Additional Information

- Deleting multiple snapshots can be time consuming depending on the number of snapshots, especially when you have multiple disks with snapshots. One way to do a bulk snapshot delete is to use this one-liner:

$ govc disk.snapshot.ls -dc="<datacenter-name>" -ds="<datastore-name>" "<cns-volume-uuid>" | awk '{print $1}' | while read snapShot ; do govc disk.snapshot.rm -dc "<datacenter-name>" -ds "<datastore-name>" "<cns-volume-uuid>" $snapShot ; done

For example:
$ govc disk.snapshot.ls -dc="Datacenter-1" -ds="Tanzu_vmfssan" "2653de7a-bf94-49d8-9d88-0351f3ccb56b" | awk '{print $1}' | while read snapShot ; do govc disk.snapshot.rm -dc "Datacenter-1" -ds "Tanzu_vmfssan" "2653de7a-bf94-49d8-9d88-0351f3ccb56b" $snapShot ; done
[18-10-23 09:28:12] Deleting 9fc48a39-3be2-4183-b645-ebbf4efb2467...OK
[18-10-23 09:45:33] Deleting 8f8df1ab-f89e-4407-8b16-0025c18b7cb8...OK


Impact/Risks:

Deleting the snapshots manually from the vSphere side while they haven't been deleted from the backup solution side may cause discrepancies in the backup solution database. Thus, the customer must confirm that those snapshots are orphaned and have already been deleted from the backup solution side. If he needs help to confirm, he should engage the backup solution vendor.