Unable to create new Persistent Volume Claims (PVCs) in a Vanilla CSI (Container Storage Interface) environment due to reaching the vSAN File Share limit.

Products

VMware vSphere Kubernetes Service

Issue/Introduction

The following error is observed when attempting to create a new PVC in a Vanilla CSI (Container Storage Interface) environment:

"Failed to create vSAN file share : The total number of shares reached limit 100."
The vCenter CNS (Cloud Native Storage) database contains excess File Volumes that no longer correspond to active ReadWriteMany PVs/PVCs in the connected Kubernetes cluster(s).
Evidence of stale CNS resources for volumes and file shares exists in the CNS database, which can be checked by running the command below from the vCenter server appliance command line interface.

/opt/vmware/vpostgres/current/bin/psql -U postgres -d VCDB -c "select * from cns.volume_info;"

Environment

VMware Cloud Native Storage

Cause

The issue is caused by the presence of stale CNS resources in the CNS database. These resources are records of vSAN File Volumes that were previously provisioned for ReadWriteMany PVCs in Kubernetes but were not properly reconciled and cleaned up in the CNS database after the corresponding PVs/PVCs were deleted from the Kubernetes cluster. This mismatch leads to the CNS database reporting a higher number of active volumes than are actually in use by Kubernetes, prematurely reaching the vSAN file share limit of 100.

Note:

For a vSAN OSA environment, the limit will always be 100.

For a vSAN ESA environment, the limit was increased to 250 from 100 in vSAN 8.0U3, and then was increased to 500 in VCF 9.0.

Resolution

The corresponding stale CNS resources must be manually identified and corrected in the CNS database to reflect the current state of the environment.

Identify Stale Volumes in CNS

Compare the list of active PVs in your Kubernetes cluster(s) with the list of File Volumes in the vCenter CNS database.

- Example: Active PVs in Kubernetes (Count = 3)

Command: kubectl get pv -A

- - pvc-aaaaaaaa-####-####-####-############
  - pvc-bbbbbbbb-####-####-####-############
  - pvc-cccccccc-####-####-####-############
- Example: CNS Database File PVs in vCenter Server database (Count = 6)

Command: /opt/vmware/vpostgres/current/bin/psql -U postgres -d VCDB -c "select * from cns.volume_info;"

- - file:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx pvc-aaaaaaaa-####-####-####-############
  - file:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx pvc-bbbbbbbb-####-####-####-############
  - file:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx pvc-cccccccc-####-####-####-############
  - file:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx pvc-dddddddd-####-####-####-############ (Excess)
  - file:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx pvc-eeeeeeee-####-####-####-############ (Excess)
  - file:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx pvc-ffffffff-####-####-####-############ (Excess)

The Excess (Stale) volumes (highlighted) are those that exist in the CNS database but do not have a corresponding PV in the Kubernetes cluster. These are the targets for deletion.

Check for Associated Kubernetes Resources

For each identified stale PV, check if any associated VolumeAttachment or Pod still exists.

- Command Example (using one of the stale PVC names):

kubectl get volumeattachment -o wide | grep <stale-pv-name>

# Example: kubectl get volumeattachment -o wide | grep pvc-dddddddd-####-####-####-############

- Analyze the result:
  - If NO VolumeAttachment and NO Pod are found: Proceed directly to the deletion step (See Warning message).
  - If a VolumeAttachment and/or a Pod is found:
    1. Locate the Pod namespace and name: kubectl get pod -n <pod namespace> -o wide
    2. Inspect the Pod: kubectl describe pod -n <pod namespace> <pod name>
    3. Consult with the application owner. If there is no PV in the Kubernetes cluster for this resource, the Pod/VolumeAttachment is also likely stale. You may need to delete the Pod/VolumeAttachment before proceeding to delete the CNS volume, as a safety measure.

Delete Stale Volumes via vSAN Managed Object Browser (MOB)

WARNING: Deleting a volume is an irreversible process. When setting the deleteDisk parameter to true, the underlying data is permanently destroyed. Ensure you have a current and valid backup of any data residing on the volume before proceeding with deletion.

Use the vSAN MOB to delete the stale volumes from the CNS database.

1. Open your web browser and navigate to
  https://<vCenter-IP-address>/vsan/mob/?moid=cns-volume-manager.
  Replace <vCenter-IP-address> with your vCenter server's IP address or hostname.
2. Select cns-volume-manager
3. Click on the CnsDeleteVolume method
4. Enter Volume Details: A pop-up window will appear.
5. Enter the Volume ID of the volume you want to delete.
6. Set the deleteDisk parameter to true.
7. Click Invoke Method to delete the volume.

Additional Information

For Supervisor Cluster with VKS, follow this article: vSphere Supervisor Workload Cluster Pod Stuck Init or ContainerCreating due to volume mount fails with 'mount.nfs4: mounting <fs>:/vsanfs/<uuid> failed, reason given by server: No such file or directory'