Persistent VMDK files cause datastore exhaustion in Kubernetes environments using VMware CSI
search cancel

Persistent VMDK files cause datastore exhaustion in Kubernetes environments using VMware CSI

book

Article ID: 434710

calendar_today

Updated On:

Products

VMware vSphere Kubernetes Service

Issue/Introduction

In environments using third-party Kubernetes distributions (such as RKE2 or OpenShift) with the VMware Cloud Foundation (VCF) CSI driver, datastore exhaustion occurs due to persistent VMDK files. This occurs as a result of the expected behavior when the VolumeSnapshotContent or PersistentVolume deletion policy is set to Retain. Under this policy, underlying storage artifacts remain on the datastore after the corresponding Kubernetes objects are deleted. Symptoms include high datastore usage and a discrepancy between the number of files in the fcd folder and active PVs in the cluster.

Environment

vSphere 8.x

RKE2, OpenShift, or other Kubernetes distributions with VMware CSI driver.

Cause

The Kubernetes deletionPolicy is set to Retain (often by third-party backup software), which instructs the vSphere CSI driver to preserve the VMDK/FCD even after the Kubernetes object is removed.

Resolution

Note: You can check the volume handle and name using the vSphere interface. The following steps are for bulk operations.

  1. Identify Active Kubernetes Volumes: Export the list of current volume handles from the Kubernetes cluster to a text file:

    kubectl describe pv > pv-list.txt

  2. Query vCenter Database (VCDB) for Registered Volumes: Log in to the vCenter Server Appliance (VCSA) via SSH and query the Cloud Native Storage (CNS) table to list all registered volumes on the affected datastore:
    (Note: Replace <DATASTORE_ID> with the actual datastore URL, e.g., ds:///vmfs/volumes/uuid/)

    • Access vCenter postgres:
      /opt/vmware/vpostgres/current/bin/psql -d VCDB -U postgres
    • Query for vCenter DB information:
      SELECT volume_id, volume_name, disk_path FROM cns.volume_info WHERE datastore = '<DATASTORE_ID>';

       

  3. Cross-Reference Data:
    • Retain: If a volume_id or volume_name (PV name) from the SQL output exists in the pv-list.txt, the disk is in use.
    • Unlinked: If the volume is present in the VCDB/Datastore but not found in pv-list.txt, it is no longer associated with a Kubernetes object and is a candidate for manual removal.

    • Using govc:

      1. Download the govc binary from the official GitHub releases page.

      2. govc disk.rm <disk_id> (where <disk_id> corresponds to the volume_id from vCenter Postgres).

    • Using MOB:

      1. Navigate to https://<vcenter-fqdn>/mob/?moid=vCenterVStorageObjectManager.

      2. Click DeleteVStorageObject_Task.

      3. Enter the volume_id and the datastore Managed Object Reference (MoRef).

      4. Click Invoke Method.

  4. Verify Deletion:

    • Once complete, the VMDK will be permanently removed from the fcd directory on the datastore.

Additional Information

https://kubernetes.io/docs/concepts/storage/volume-snapshots/

"Deletion is triggered by deleting the VolumeSnapshot object, and the DeletionPolicy will be followed. If the DeletionPolicy is Delete, then the underlying storage snapshot will be deleted along with the VolumeSnapshotContent object. If the DeletionPolicy is Retain, then both the underlying snapshot and VolumeSnapshotContent remain."