Persistent volume is failing to attach with Error: "The input volume <Volume handle ID> is not registered as a CNS volume"

Products

VMware vCenter Server 7.0 VMware vCenter Server 8.0

Issue/Introduction

Issue: CSI volumeattachment is getting the below error while attaching Persistent Volume to a Kubernetes node.

Status:
  Attach Error:
    Message:  rpc error: code = Internal desc = failed to attach disk: "********-****-****-****-*******" with node: "********-****-****-****-*******" err failed to attach cns volume: "********-****-****-****-*******" to node vm: "VirtualMachine:vm-**** [VirtualCenterHost: *********, UUID:********-****-****-****-*******, Datacenter: Datacenter [Datacenter: Datacenter:datacenter-****, VirtualCenterHost: *************]]". fault: "(*types.LocalizedMethodFault)(0xc00000000)({\n DynamicData: (types.DynamicData) {\n },\n Fault: (types.CnsFault) {\n  BaseMethodFault: (types.BaseMethodFault) <nil>,\n  Reason: (string) (len=88) \"The input volume********-****-****-****-******* is not registered as a CNS volume.\"\n },\n LocalizedMessage: (string) (len=35) \"fault.CnsNotRegisteredFault.summary\"\n})\n". opId: "xxxx"
    Time:     YYYY:MM:DD
  Attached:   false
  Detach Error:
    Message:  rpc error: code = Internal desc = volumeID "********-****-****-****-*******" not found in QueryVolume

Also, PV would not be listed on the vCenter UI under "Container Volumes". Refer the below screenshot

Environment

VMware Tanzu Kubernetes Grid
vSphere with Tanzu 7.x
vSphere with Tanzu 8.x

Cause

It can be caused due to general discrepancies between the datastore and the CNS database.

Resolution

NOTES:

Before rebuilding the catalog, please go through the KB article - Reconciling Discrepancies in the Managed Virtual Disk Catalog to identify which API and spec will be suitable for reconciliation based on the VC and Host versions.
Throughout this procedure, please make sure that no other CNS/FCD operations are running on the datastore.
If the hosts are running vSAN, first run "esxcfg-advcfg -s 1 /VSAN/IgnoreClusterMemberListupdates" on ALL hosts in the vSAN cluster. This will prevent vCenter from removing any hosts from the cluster when they stop communicating (due to stopping hostd). Once the procedure in this article is completed, re-enable vCenter member updates by running "esxcfg-advcfg -s 0 /VSAN/IgnoreClusterMemberListupdates".
If the datastore is mounted on hosts of more than 1 VC (i.e, shared among VCs) then run the following command -
SSH to one of the ESX host and run

find /vmfs/volumes/<datastore>/catalog/journal -name "*.xaction"

If the output has some files, please refrain from rebuilding the catalog

Steps to Rebuild Catalog:

Identify all affected datastores.
Identify all ESXi hosts connected to the datastores for which the FCD catalog is being rebuilt.
Pick one such host and SSH to the host & execute the following commands

a. Stop "hostd"

/etc/init.d/hostd stop

b. For each identified datastore, Move catalog folder to a backup folder

mkdir /tmp/<datastore>-catalog-bkp(or any other temporary directory in any other datastore)
cd /vmfs/volumes/<datastore>/catalog
mv $(ls | grep -v journal) /tmp/catalog
ls /tmp/catalog #(make sure all files are copied)

c. Start "hostd"

/etc/init.d/hostd start

4. SSH to other hosts on which the datastore is mounted and execute the following command

Restart "hostd"

/etc/init.d/hostd restart

Or run the following to avoid "hostd" restart

/usr/lib/vmware/hostd/bin/notifyDatastore.py -t PreUnmount -d <dsName>

5.1 For vCenter Server version 9.0 and above

a. To use the vCenter MOB, the MOB needs to be enabled before accessing.
Go to VC MOB > content > vStorageObjectManager > ReconcileDatastoreInventoryEx_Task

MOB URL will be similar to
https://<vcsa- fqdn>/mob/?moid=VStorageObjectManager&method=reconcileDatastoreInventoryEx

b. Replace the existing spec with the following -

<spec>
<datastore type="Datastore">datastore-MOID</datastore>
</spec>

- To get the datastore MOID, go to VC MOB > content >rootFolder > datacenter > datastore > moid / "datastore-##"
- From the URL when the datastore is selected in the vSphere client
- Invoke the task by replacing the datastore moid in the above spec and wait till it succeeds.
- Repeat this for each identified datastore.

5.2 For older releases

a. To use the ESXi MOB, the MOB needs to be enabled before accessing. Enable host MOB

b. Go to host MOB > ha-vstorage-object-manager > HostReconcileDatastoreInventory_Task

MOB URL will be similar to
https://<ESXI_host_fqdn/IP>/mob/?moid=ha-vstorage-object-manager&method=reconcileDatastoreInventory

c. Provide the <datastore-UUID>
Example: ds:///vmfs/volumes/<datastore-UUID>/ (Found on the summary page of the datastore in the vCenter UI)
d. Run the reconcile task and wait till it succeeds.
e. Repeat this for each identified datastore.

6. SSH into the host selected for the above steps. Look for the catalog folder and it should show again for all the datastores taken out in STEP 1.

a. ls /vmfs/volumes/<datastore-path>/catalog

b. Verify that vclock is created in the format of "vclock-" for all the datastores:

ls /vmfs/volumes/<datastore-path>/catalog

c. Verify tidy file is created for all datastores. Below command should return a file named "1.dat”

ls /vmfs/volumes/<datastore-path>/catalog

7.1 For ESXi hosts version 9.0 and above

a. Trigger CNS fullSync (for VC version 8.0 and above), by following the below instructions:

b. SSH to VC and execute the following command

psql -U postgres -d VCDB

c. Get the required datastore URL from vpx_ds_info or cns.vpx_storage_datastore_info:

select name, url from vpx_ds_info;
select * from cns.vpx_storage_datastore_info;
update cns.vpx_storage_datastore_info set vclock=-1 where datastore_url=‘<>’;
delete from cns.volume_info where datastore='<>';

d. Restart vsan-health service

vmon-cli --restart vsan-health

e. Wait for Full Sync to complete. Following log lines can be seen in the CNS logs (/var/log/vmware/vsan-health/vsanvcmgmtd.log)

2025-01-28T10:40:31.675Z info vsanvcmgmtd[219935] [vSAN@6876 sub=CnsSync] Sync all datastores ...
...
2025-01-28T10:40:34.019Z info vsanvcmgmtd[219935] [vSAN@6876 sub=CnsSync] Sync ds:///vmfs/volumes/<ds-uuid>: startVClock = 0, fullSync = true
...
2025-01-28T10:40:42.975Z info vsanvcmgmtd[219935] [vSAN@6876 sub=CnsSync] Synced all datastores

f. Confirm that the CNS database is updated with the correct vclock values:

psql -U postgres -d VCDB
select * from cns.vpx_storage_datastore_info;
select * from cns.volume_info where datastore='<>';

7.2 For older ESXi hosts

a. If the database content is not getting updated even after some time, and the database is showing old content, it may require triggering a sync for "StorageLifecycleManager".

https://<VCIP>/vslm/mob//?moid=StorageLifecycleManager&method=VslmSyncDatastore

The datastore URL will be like "ds:///vmfs/volumes/##### - ###### - ######". (Found on the summary page of the datastore in the vCenter UI)
Set "fullSync=true"
"fcd Id" can be blank

b. Once this is done, wait for some time for data sync between SPS and CNS. After this sync, the Persistent volume will show up in vCenter's CNS UI.

Additional Information

Reconciling Discrepancies in datastore