Creating a new PVC reports ProvisioningFailed with the error "no shared datastores found for nodeVm"

Products

VMware vSphere Kubernetes Service

Issue/Introduction

A Vanilla Kubernetes environment running csi-provisioner:v3.1.0. In this setup, the vSAN datastore is mounted on every ESXi host in the cluster and is accessible by every Kubernetes node.
The worker VMs are distributed across multiple Availability Zones (AZs), where each AZ may be connected to different datastores.
PVC Status: Remains in Pending state.
Events: Describing the PVC shows a ProvisioningFailed warning with an RPC internal error.

Name: <example-label>-pd-<example-label>-0
Namespace: <example-namespace>
StorageClass: <example-storage-class>
Status: Pending
Volume:
Labels: app=<example-label>
release=<example-label>
Annotations: volume.beta.kubernetes.io/storage-provisioner: csi.vsphere.vmware.com
volume.kubernetes.io/selected-node: <example-worker-node>
volume.kubernetes.io/storage-provisioner: csi.vsphere.vmware.com
Finalizers: [kubernetes.io/pvc-protection]
Capacity:
Access Modes:
VolumeMode: Filesystem
Used By: <example-label>-0
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal WaitForFirstConsumer 1m persistentvolume-controller waiting for first consumer to be created before binding
Warning ProvisioningFailed 1m csi.vsphere.vmware.com_vsphere-csi-controller-86d4f68d95-w6l2m_####-5fe0-4696-####-a1ba37aff452 failed to provision volume with StorageClass "<example-storage-class>": rpc error: code = Internal desc = failed to get shared datastores in kubernetes cluster. Error: no shared datastores found for nodeVm: VirtualMachine:vm-### [VirtualCenterHost: <example-vCenter-name>, UUID: 420cb357-####-2d6f-####-2f0e108749d9, Datacenter: Datacenter [Datacenter: Datacenter:datacenter-####, VirtualCenterHost: <example-vCenter-name>]]
Normal ExternalProvisioning 62s (x4 over 1m) persistentvolume-controller waiting for a volume to be created, either by external provisioner "csi.vsphere.vmware.com" or manually created by system administrator

csi-syncer pod logs:

{"level":"info","time":"[timestamp]","caller":"syncer/fullsync.go:41","msg":"FullSync: start"}
{"level":"warn","time":"[timestamp]","caller":"syncer/fullsync.go:433","msg":"could not find any volume which is present in both k8s and in CNS"}
{"level":"info","time":"[timestamp]","caller":"syncer/fullsync.go:276","msg":"FullSync: fullSyncDeleteVolumes could not find any volume which is not present in k8s and needs to be checked for volume deletion."}
{"level":"info","time":"[timestamp]","caller":"syncer/fullsync.go:160","msg":"FullSync: end"}

csi-controller pod logs:

[timestamp] 1 event.go:285] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"example-namespace", Name:"example-pvc-name", UID:"d8dd60ee-####-422f-####-dd4276f2912e", APIVersion:"v1", ResourceVersion:"884038317", FieldPath:""}): type: 'Normal' reason: 'Provisioning' External provisioner is provisioning volume for claim "example-namespace/example-pvc-name"
[timestamp] 1 connection.go:186] GRPC response: {}
[timestamp] 1 connection.go:187] GRPC error: rpc error: code = Internal desc = failed to get shared datastores in kubernetes cluster. Error: no shared datastores found for nodeVm: VirtualMachine:vm-#### [VirtualCenterHost: example-vCenter-FQDN, UUID: 42220481-####-5b62-####-17e69eb2a91d, Datacenter: Datacenter [Datacenter: Datacenter:datacenter-####, VirtualCenterHost: example-vCenter-FQDN]]

Cause

In a multi-AZ environment where Pod and PVC placement is not restricted to a specific zone, the CSI driver must identify a datastore with shared visibility across all cluster nodes. In vSphere CSI 3.1.0 and later, topology-aware requests specifically require that a datastore be accessible to every host within a given topology segment (e.g., a tagged Cluster or Datacenter).

Provisioning fails when the vSphere CSI Plug-in lacks a Topology-Aware configuration. If the driver is not configured with the appropriate topology-categories, or if the underlying vSphere objects (Datacenters, Clusters, or Hosts) lack corresponding vSphere tags, the provisioner cannot determine which datastores are "shared" within the required boundary. Essentially, the driver cannot validate zonal accessibility because the mapping between Kubernetes nodes and the vSphere infrastructure topology has not been established.

Resolution

To resolve this issue, apply the following configuration requirements for the topology-aware vSphere CSI Plug-in. These steps ensure the CSI provisioner identifies datastores accessible to nodes where workloads are scheduled.

The steps below provide a high-level overview. For detailed instructions, refer to the official documentation: Deploying vSphere Container Storage Plug-in with Topology.

vCenter Infrastructure Tagging:
1. Confirm that tag categories exist in vCenter for k8s-region (associated with Datacenters) and k8s-zone (associated with Clusters/Hosts).
2. Ensure the corresponding tags are assigned to the appropriate vSphere objects to define the storage boundaries.
vSphere CSI Secret:
1. Verify that the vSphere configuration file (csi-vsphere.conf) includes the topology-categories parameter under the [Labels] section.
2. Without this, the CSI driver cannot bridge Kubernetes node labels with vSphere tags.
  [Labels]
  topology-categories = "k8s-region, k8s-zone"
Deployment Manifests:

Ensure the vsphere-csi-controller deployment is updated so that the csi-provisioner sidecar includes the necessary topology flags. Use the following command to check the arguments:
kubectl get deployment vsphere-csi-controller -n vmware-system-csi -o yaml

Required Arguments:

--feature-gates=Topology=true
--strict-topology
StorageClass Binding Mode:

The volumeBindingMode in the StorageClass should be set to WaitForFirstConsumer. This setting is vital because it prevents the CSI driver from attempting to provision a volume until the Kubernetes scheduler has picked a specific node, ensuring the datastore selected is shared with that node's zone.
Confirm Node Labeling:

Run the following command to verify that the CSI driver has successfully labeled the Kubernetes nodes with the topology data from vCenter: kubectl get nodes --show-labels | grep topology.csi.vmware.com