A pod in the guest cluster is stuck in ContainerCreating state.
When describing the pod, we see a volume attachment failure event because the CNS fails to retrieve the datastore that is backing the volume.
Warning FailedAttachVolume 18s attachdetach-controller AttachVolume.Attach failed for volume "pvc-########-####-####-2222-############" : rpc error: code = Internal desc = observed Error: "ServerFaultCode: CNS: Failed to retrieve datastore for vol ########-####-####-0000-############. (vim.fault.NotFound) {\n faultCause = (vmodl.MethodFault) null, \n faultMessage = <unset>\n msg = \"The vStorageObject (vim.vslm.ID) {\n dynamicType = null,\n dynamicProperty = null,\n id = ########-####-####-0000-############\n} was not found\"\n}" is set on the volume "########-####-####-####-############-########-####-####-2222-############" on virtualmachine "tkgs-cluster-1-worker-nodepool-##-####-########-####"
The volume metadata is missing from the Pandora database on the vCenter.
root@vcsa1 [ ~ ]# /opt/vmware/vpostgres/current/bin/psql -d VCDB -U postgres -c "SELECT * from vpx_storage_object_info where id='########-####-####-0000-############'" id | name | capacity | datastore_url | create_time | v_clock | backing_object_id | disk_path | used_capacity----+------+----------+---------------+-------------+---------+-------------------+-----------+---------------(0 rows)
The volume is present in the backend datastore but missing from Pandora database
Due to the para-virtualized architecture of the CSI in the guest clusters, each volume is represented by two names: one is generated by the pvCSI in the guest cluster and the other is generated by the CSI in the supervisor cluster.
The latter is the value that is stored in the database. Therefore, don't filter by the volume name that was mentioned in the pod events.
Instead, use the volume ID. If the pod events didn't mention that ID, please check the resolution section for more details on how to get it.
Resolved in vCenter Server 8.0 Update 3e or above .
Note: The Pandora db is no longer used in vSphere 8.0
Identify the volume ID, if it isn't mentioned in the pod events.
The guest clusters utilize a para-virtualized CSI.
This means the PV on the guest clusters refers to a PVC on the supervisor cluster.
This PVC will be bound to a PV refers to the volume ID.
Describe the problematic volume in the guest cluster and get the VolumeHandle.
$ kubectl describe pv pvc-########-####-####-2222-############ | grep -i VolumeHandle
Example output:VolumeHandle: ########-####-####-####-############-########-####-####-2222-############
Go to the supervisor cluster and get the PV that is bound to this PVC.
# kubectl get pvc -A | grep -i ########-####-####-####-############-########-####-####-2222-############
Example output:tanzu-1 ########-####-####-####-############-########-####-####-2222-############ Bound pvc-########-####-####-3333-############ 1Gi RWO tanzu 50d
Get the VolumeHandle of this PV, this is our volume ID.
root@42320f0e4760472d1a96bbbd0bdaa921 [ ~ ]# kubectl describe pv pvc-########-####-####-3333-############ | grep -i VolumeHandleExample output:
VolumeHandle: ########-####-####-0000-############
Note: Can also identify the ID from the pvCSI controller logs or the CNS logs (vsanvcmgmtd.log) if the logs have not rolled over.
Given the datastore name that is backing the volume, get the managed object ID (MOID) of that datastore by running this command on the vCenter.# dcli com vmware vcenter datastore list | grep -i TanzuExample output:
|datastore-2009|Tanzu |VMFS|5272240128 |241323474944|
In this example, the MOID is datastore-2009.
Go to https://<vcenterIp>/mob and log in with the SSO administrator account.
Go to content > VStorageObjectManager > VCenterUpdateVStorageObjectMetadataEx_Task
Insert the volume ID, the datastore MOID and the metadata KeyValue as seen below
Click Invoke Method.
Make sure that the DB has been updated successfully.# /opt/vmware/vpostgres/current/bin/psql -d VCDB -U postgres -c "SELECT * from vpx_storage_object_info where id='########-####-####-0000-############'"
Example output: id | name | capacity | datastore_url | create_time | v_clock | backing_object_id | disk_path | used_capacity--------------+-------------------+---------+-----------------------------+----------------+---------+-------------------+---------------------------------------------------+--------------- ########-####-####-0000-############ | pvc-########-####-####-3333-############ | 1024 | ds:///vmfs/volumes/########-########-1111-############/ | 2022-09-13 21:16:54.459 | 91 | | [Tanzu] fcd/#####.vmdk | -1(1 row)
Wait 2-3 minutes and check the pod status. If it didn't run, then delete the pod and recreate it.
If the issue persists, then there may be some discrepancies with the disk catalog, refer Reconciling Discrepancies in the Managed Virtual Disk Catalog
########-########-1111-############/fcd/#####.vmdkNote: Get the datastore name. It will be needed as part of the solution.
[root@esxi:~] localcli storage filesystem list | grep -i ########-########-1111-############/vmfs/volumes/ Tanzu ########-########-1111-############ true VMFS-6 241323474944 5274337280########-########-1111-############
In this example, the datastore's name is Tanzu.