In a VKS cluster, pods using persistent volumes are stuck in Init state.
While connected to the VKS cluster context, the following symptoms are observed:
kubectl get pods -n <namespace> -o widekubectl describe pod <pod name> -n <pod namespace>
FailedAttachVolume: AttachVolume.Attach failed for volume <pv name> : rpc error: code = Internal desc = Watch on virtualmachine timed out
Unable to attach or mount volumes
Timed out waiting for the conditionkubectl get pv,pvc -n <namespace>kubectl get volumeattachmentsRestarting CSI controller pods do not resolve the issue.
Storage DRS is enabled on the datastores or datastore cluster used by the VKS cluster.
When comparing the output of GOVC and VCDB for the affected volume(s), there is a discrepancy in datastore placement for the associated disk.
vSphere Supervisor
VKS Cluster
Storage DRS enabled - This is unsupported in VKS environments
Storage DRS is not supported in VKS and vSphere Supervisor environments.
When Storage DRS automatically moves a volume to another datastore, this breaks the connection and CNS management of the volume.
This can result in discrepancies between where VCDB considers the volume to be located and where the volume is actually located.
Disable Storage DRS on the datastore cluster for the datastores used by VKS.
This will prevent the issue from happening again, but the below workaround may need to be performed to fix the datastore discrepancy caused by Storage DRS.
Issue Verification
kubectl describe pod -n <namespace> <stuck pod name>kubectl get pv,pvc -n <namespace> | grep <human-readable name for the volume>kubectl describe pv <pvc-ID> | grep -i volumehandle
kubectl get pvc -A | grep -i <volumeHandle from previous step>Note: Persistent volumes (pv) are named "pvc-<ID>"/opt/vmware/vpostgres/current/bin/psql -U postgres -d VCDB -c "select * from cns.volume_info where volume_name = '<pvc-name from Supervisor cluster>';"Document the volume ID, datastore and vmdk from the above output.
export GOVC_URL=<vCenter FQDN>
export GOVC_USERNAME=<admin User>
export GOVC_PASSWORD=<admin Password>
export GOVC_INSECURE=truegovc disk.ls -k -dc=<datacenter> -ds=<datastore from VCDB> -l <volume ID from VCDB>
Reconcile the Datastore
Follow the below KB article to reconcile the affected datastores.
Depending on the size of the environment, this can take multiple hours to take effect.
Reconciling Discrepancies in the Managed Virtual Disk Catalog
vSphere DRS is required in Fully Automated mode for VKS and vSphere Supervisor environments.