PVC Resize Fails with error message “volume with existing snapshots can’t be expanded” in the vSphere CSI Plugin.
search cancel

PVC Resize Fails with error message “volume with existing snapshots can’t be expanded” in the vSphere CSI Plugin.

book

Article ID: 405301

calendar_today

Updated On:

Products

VMware vCenter Server Tanzu Kubernetes Runtime VMware Tanzu Kubernetes Grid VMware vSphere Kubernetes Service

Issue/Introduction

  • Attempting to expand a Kubernetes PVC on TKGm/VKS with vSphere CSI driver fails, producing csi logs similar to:
Warning ExternalExpanding persistentvolumeclaim/peer##-###-##-## Ignoring the PVC: didn’t find a plugin capable of expanding the volume; waiting for an external controller to process this PVC.
Warning VolumeResizeFailed persistentvolumeclaim/… resize volume "<volume-id>" by resizer "csi.vsphere.vmware.com" failed: rpc error: code = FailedPrecondition desc = volume: <volume-id> with existing snapshots […] can’t be expanded. Please delete snapshots before expanding the volume
  • The snapshot is visible in vCenter Server > Datastore > VM > Snapshots and also visible with govc command: govc disk.ls -L=true "<volume-id>", but not listed in kubectl get volumesnapshot or on the Kubernetes side.
  • PVC expansion remains stuck in Resizing state indefinitely.

Environment

Tanzu Kubernetes Multicloud

vSphere with Tanzu 

vSphere CSI Plugin

Openshift

Rancher

Cause

When a vSphere PV has one or more existing snapshots at the vCenter or datastore level, the vSphere CSI controller cannot perform an online filesystem expansion. These snapshots may have been created previously by Velero or another backup mechanism. The CSI driver’s precondition check blocks any expansion until all underlying snapshots are removed.

Resolution

Note: Make sure to validate a healthy backup of the PVC before performing the below steps, as incorrect use of this procedure may lead to data loss or corruption. If you are not sure what you are doing, please open a case with VMware/Broadcom Support. 

To resolve the issue, delete the PV snapshot via the vSAN Managed Object Browser (MOB) CnsVolumeManager. 

Step 1: Identify the Affected Volume and Snapshot IDs

  1. SSH into your VCSA as root.
  2. Connect to the embedded PostgreSQL vCenter database:
    /opt/vmware/vpostgres/current/bin/psql -d VCDB -U postgres
  3. Run a query to list snapshot entries for your volume:
    SELECT snapshot_id, volume_id, description FROM cns.vpx_storage_snapshot_info WHERE volume_id = '<volume-id>';
  4. Note the snapshot_id values returned.

 

Step 2: Start the vSAN MOB

  1. Launch the Ruby vSphere Console (RVC) on your VCSA SSH:
    rvc
  2. When prompted, enter the vCenter SSO user and host:
    Host to connect to (user@host): [email protected]@localhost
  3. At the rvc > prompt, start the vSAN MOB:
    vsan.debug.mob --start localhost
  4. You will see:
    vSAN managed object browser is started; please access: https://<vcenter-fqdn>/vsan/mob

 

Step 3: Delete the Snapshot via CnsVolumeManager

  1. In your browser, navigate to:
    https://<vcenter-fqdn>/vsan/mob/?moid=cns-volume-manager
  2. Click CnsDeleteVolume.
  3. In the pop-up dialog, enter the Volume ID and each Snapshot ID you obtained in Step 1 to below.
     <!-- snapshotDeleteSpecs array start -->
    <snapshotDeleteSpecs>
      <volumeId>
        <id><VOLUME_ID></id>
      </volumeId>
      <snapshotId>
        <id><SNAPSHOT_ID></id>
     </snapshotId>
    </snapshotDeleteSpecs>
    <!-- array end -->
  4. Click Invoke.
  5. Confirm in the vCenter UI (Datastore & Snapshots view) that the snapshot entry is removed.

 

Step 4: Try PVC Expansion from Kubernetes cluster.

Additional Information

Impact/Risks:
If the delete of the persistent volume from a Kubernetes node is performed directly in the vSphere Web Client rather than via the CnsVolumeManager then it may introduce data integrity issues, as CNS will not be aware of the change applied to the First Class Disk (FCD) object.

For more information to collect CSI logs refer to the official Broadcom KB: https://knowledge.broadcom.com/external/article/379178/basic-troubleshooting-and-retrieving-log.html