Persistent volume is failing to attach with Error: "The input volume <Volume handle ID> is not registered as a CNS volume"

Products

VMware vCenter Server 7.0 VMware vCenter Server 8.0

Issue/Introduction

Issue: CSI volumeattachment is getting below error while attaching PV to k8s node.

Status:
  Attach Error:
    Message:  rpc error: code = Internal desc = failed to attach disk: "1e10bd8c-4713-4731-95df-*******" with node: "420c5615-7b83-a52f-90b8-*******" err failed to attach cns volume: "1e10bd8c-4713-4731-95df-********" to node vm: "VirtualMachine:vm-**** [VirtualCenterHost: *********, UUID: ********-7b83-a52f-90b8-********, Datacenter: Datacenter [Datacenter: Datacenter:datacenter-****, VirtualCenterHost: *************]]". fault: "(*types.LocalizedMethodFault)(0xc000987360)({\n DynamicData: (types.DynamicData) {\n },\n Fault: (types.CnsFault) {\n  BaseMethodFault: (types.BaseMethodFault) <nil>,\n  Reason: (string) (len=88) \"The input volume 1e10bd8c-4713-4731-95df-******** is not registered as a CNS volume.\"\n },\n LocalizedMessage: (string) (len=35) \"fault.CnsNotRegisteredFault.summary\"\n})\n". opId: "dde210ec"
    Time:     YYYY:MM:DD
  Attached:   false
  Detach Error:
    Message:  rpc error: code = Internal desc = volumeID "1e10bd8c-4713-4731-95df-********" not found in QueryVolume

Also, PV should not be listed on vCenter UI under "Container Volumes" . Refer below screenshot

Environment

VMware Tanzu Kubernetes Grid 1.x

vSphere with Tanzu 7.0

vSphere with Tanzu 8.0

Cause

It can be caused if storage migration takes place, due to sDRS being enabled on a datastore cluster. This is not supported with CSI driver.

vSphere Functionality Supported by vSphere Container Storage Plug-in

Can also be caused due to general discrepancies between datastore and database.

Resolution

Rebuild FCD catalog and reconcile datastore.

Identify all affected datastores.
Identify all ESXi hosts connected to the datastores for which FCD catalog is being rebuilt.
For each identified host, ssh into it and stop the hostd process./etc/init.d/hostd stopNOTE: If the hosts are running vSAN, first run "esxcfg-advcfg -s 1 /VSAN/IgnoreClusterMemberListupdates" on ALL hosts in the vSAN cluster. This will prevent vCenter from removing any hosts from the cluster when they stop communicating (due to stopping hostd). Once the procedure in this KB is completed, re-enable vCenter member updates by running "esxcfg-advcfg -s 0 /VSAN/IgnoreClusterMemberListupdates".
Select one of the hosts that has the datastore mounted and ssh into it. Take backup of catalog folder for all the datastores listed above and restart hostd.
1. Example:
1. 1. cd /vmfs/volumes/<datastore>/
  2. mv catalog catalog_backup
  3. /etc/init.d/hostd startNOTE: If running vSAN, files cannot be moved ("mv"), as it is object based storage. Contents will need to be moved, rather than the catalog directory itself as outlined below.cd catalogmv * /tmp/catalogIf running vSAN, the reconcile task is to be run on the vCenter MOB rather than the ESXI host.https://<vcsa- fqdn>/mob/?moid=VStorageObjectManager&method=reconcileDatastoreInventory
    
    NOTE: Step a and b should be performed for all datastores taken out in STEP 1. Once first two steps are done on all the datastores, then execute step c. No need to run these steps on all hosts. Do it on only one host.
    
    To use the ESXi MOB, the MOB needs to be enabled before accessing. Enable host MOB
    - Use the datastore URL
      
      Example: ds:///vmfs/volumes/<datastore-UUID>/ (Found on the summary page of the datastore in the VC vSphere client)
    If using vCenter MOB use the datastore MOID e.g., "datastore-23" (From URL when datastore selected in vSphere client)
Go to host MOB and invoke the ReconcileDatastoreInventory_Task with datastore id one by one.
- https://<ESXI_host_fqdn/IP>/mob/?moid=ha-vstorage-object-manager&method=reconcileDatastoreInventory
  - Path: MOB > content > vStorageObjectManager > HostReconcileDatastoreInventory_Task
- For each task invoked, wait until the task state is showing "Success".
While connected host selected for the above steps via ssh, look for the catalog folder for each of the datastores determined from Step 1. The folder should have been regenerated.

ls /vmfs/volumes/<datastore-path>/catalog
Verify that vclock is created in the format of "vclock-<number>" for all the datastores.

ls /vmfs/volumes/<datastore-path>/catalog/vclock
Verify tidy file is created for all datastores. Below cmd should return a file named "1.dat"

ls /vmfs/volumes/<datastore-path>/catalog/tidy/v1
Once above verification steps are done, ssh into remaining hosts and start the hostd process.

/etc/init.d/hostd start
Connect to vCenter via ssh as root user.
- Connect to the VCDB.
  
  /opt/vmware/vpostgres/current/bin/psql -d VCDB -U postgres
- Run the below queries.
  - Check VC DB for database "vclock" value, and do so repeatedly. The value should gradually increase until it reaches the number found in step 6 above.
    
    psql> select max(v_clock) from VPX_STORAGE_OBJECT_INFO;
  - To confirm if VCDB and UI are listing correct number of container volumes, compare the count of listed records to the number of "vmdk" files on the affected datastores.
    
    psql> select volume_id, volume_name, datastore from cns.volume_info;
- If the DB content is not getting updated even after some time, and DB is showing old content, it may require triggering a sync for "StorageLifecycleManager".
  
  https://<VCIP>/vslm/mob//?moid=StorageLifecycleManager&method=VslmSyncDatastore
  - This needs to be run for all affected datastores.
  - Datastore url will be something like "ds:///vmfs/volumes/63c3eecf-cc6eb678-8d30-00**********". (Found on the summary page of the datastore in the VC vSphere client)
  - Set "fullSync=true".
  - "fcdId" can be blank.
- Run query below and make sure it returns data. This will list the missing info in the DB as of now.
  
  select key,value from VPX_STORAGE_OBJECT_MD where id='<Persistent_Volume_handle_ID>';
Once this is done, wait for some time for data sync between sps and CNS. After this sync, Persistent volume in vCenter's CNS UI.
Check if PV is now attached to pod(s).

Additional Information

Reconciling Discrepancies in datastore