vCenter server UI is spammed with multiple "attach container volume" and "de-attach container volume" tasks

Products

Tanzu Kubernetes Runtime

Issue/Introduction

The vCenter server UI is consistently spammed with "attach container volume" and "de-attach container volume" tasks.
Per /var/log/vmware/vsan-health/vsanvcmgmt.log, the task to de-attach the volume from the node confirms that the volume is still attached to another node. In the next log line, it also confirms the number of tasks which are still pending to be processed by the CNS (this number would be in thousands).

info vsanvcmgmtd[23472] [vSAN@6876 sub=CnsTask opID=<ID>] A com.vmware.cns.tasks.detachvolume task is created: task-<ID>
info vsanvcmgmtd[23472] [vSAN@6876 sub=FcdService opID=<ID>] Volume <ID> is attached to vm vm-<ID>
info vsanvcmgmtd[23472] [vSAN@6876 sub=WorkflowManager opID=<ID>] Detach volume task conflicting with resource <ID>. <number of pending tasks> tasks are already in queue
info vsanvcmgmtd[23472] [vSAN@6876 sub=VsanTaskSvc opID=<ID>] ADD public task 'task-<ID>', total: 930721
info vsanvcmgmtd[23472] [vSAN@6876 sub=AdapterServer opID=<ID>] Finished 'detach' on 'cns-volume-manager' (60 ms): done
Performing a datastore re-conciliation for the underlying datastores where the corresponding FCDs' reside does not fix the issue.
The CNS logs (/var/log/vmware/vsan-health/vsanvcmgmt.log) on the vCenter Server help obtain the Persistent Volume name. Below is an example of how it looks like.

delete = true,
clusterId = "<ID of the cluster object in the vCenter Server>",
entityType = "PERSISTENT_VOLUME_CLAIM",
namespace = "ai",
referredEntity = (vim.cns.KubernetesEntityReference) [
(vim.cns.KubernetesEntityReference) {
entityType = "PERSISTENT_VOLUME",
entityName = "<Name of the persistent volume>",
clusterId = "<ID of the cluster object in the vCenter Server>",

On checking the status of the persistent volume inside the guest cluster, the same is stuck in "Released" state. Below is how the describe output of the concerned PV looks like. The same also confirms that the csi-controller inside the Guest cluster is waiting for the corresponding PVC inside the supervisor cluster to be deleted within 240 seconds.

Name: pvc-<ID>
Labels: <none>
Annotations: pv.kubernetes.io/provisioned-by: csi.vsphere.vmware.com
volume.kubernetes.io/provisioner-deletion-secret-name:
volume.kubernetes.io/provisioner-deletion-secret-namespace:
Finalizers: [kubernetes.io/pv-protection external-attacher/csi-vsphere-vmware-com]
StorageClass: <storage class name>
Status: Released

Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning VolumeFailedDelete #m#s (x495 over 2d19h) csi.vsphere.vmware.com_vsphere-csi-controller-<ID> rpc error: code = Internal desc = persistentVolumeClaim: <Volume Handle ID of the PV> on namespace: <namespace> inside the supervisor cluster where the guest cluster is deployed> in supervisor cluster was not deleted. Error: persistentVolumeClaim <namespace>/<volume handle ID> is not deleted within 240 seconds: message: unable to fetch PersistentVolumeClaim <namespace>/<volume handle ID> with err: client rate limiter Wait returned an error: context deadline exceeded.
On further checking the status of the corresponding PVC on the supervisor, the same is stuck in terminating.

<namespace> persistentvolumeclaim/<PVC Name/Volume Handle ID> Terminating pvc-<ID> <capacity> <provisioning type>
root@<ID of supervisor VM> [ ~ ]#
root@<ID of supervisor VM>[ ~ ]#
root@<ID of supervisor VM> [ ~ ]#
Per vsphere-syncer logs of the csi-controller in the supervisor cluster, syncer is waiting for the CNS service to respond.

{"level":"error","time":"<date>T<time>","caller":"cnsnodevmattachment/cnsnodevmattachment_controller.go:584","msg":"failed to detach disk: \"<ID>\" to nodevm: VirtualMachine:vm-<ID> [VirtualCenterHost: <vCenter Server Hostname>, UUID: <ID>, Datacenter: Datacenter [Datacenter: Datacenter:<ID>, VirtualCenterHost: pctvcenter.yardipc.com]] for CnsNodeVmAttachment request with name: \"<CnsNodeVmAttachmentID>\" on namespace: \"<namespace>\". Err: time out for task Task:task-<ID> before response from CNS","TraceId":"<ID>","stacktrace":"sigs.k8s.io/vsphere-csi-driver/v3/pkg/syncer/cnsoperator/controller/cnsnodevmattachment.(*ReconcileCnsNodeVMAttachment).Reconcile.func1\n
On checking the status of the task associated with "attach" or "de-attach" container volume in VCDB, the task is queued and not completed. The command to check the task status and a relevant command output can be seen below. The other similar tasks also show up as "queued" inside the VCDB.

VCDB=# select * from vpx_task where task_id=<ID obtained from CNS logs in step 2>;

task_id | name | descriptionid | entity_id | entity_type | entity_name | locked_data | complete_state | cancelled | cancellable | error_data | result_data | progress |
reason_data | queue_time | start_time | complete_time | event_chain_id | username | vm_
id | host_id | computeresource_id | datacenter_id | resourcepool_id | folder_id | alarm_id | scheduledtask_id | change_tag_id | parent_task_id | root_task_id | description | activation_id | continuous | no_of_reattempts | preserved_session_uuid | activation_meth
od_name | activation_method_arguments
-----------+------+-----------------------------------+-----------+-------------+--------------------------------------------+-------------+----------------+-----------+-------------+------------+-------------+----------+------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------+------------+---------------+----------------+----------------+----
----+---------+--------------------+---------------+-----------------+-----------+----------+------------------+---------------+----------------+--------------+-------------+---------------+------------+------------------+------------------------+----------------
--------+-----------------------------
<ID>| | com.vmware.cns.tasks.detachvolume | <ID> | 0 | <Nodepool-ID> | | queued | 0 | 0 | | | | <obj xmlns:xsd="http://www.w3.org/2001/XM
LSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="urn:vim25" versionId="8.0.3.0" xsi:type="TaskReasonUser"><userName>com.vmware.cns</userName></obj> | <date and time> | | | <session_id>| com.vmware.cns | 195
978 | 1141 | 1003 | 3 | | | | | | | | | <ID> | 0 | 0 | |
In some cases, it causes the vSAN-health service to crash.

Environment

VMware vSphere CSI Driver
VMware vSphere Kubernetes Service

Cause

When the volumes are deleted in CNS, they're not actually removed, the 'mark_for_delete' flag is set to true and the CNS service lets the periodic sync handle the deletion. However, in the event the vsan-health service experiences a very high load with creates and deletes both happening every few seconds, the periodic sync is stuck looping and fetching catalog changes. As a result, CNS is stuck fetching the changes over and over and never goes to the processing stage.

Resolution

This issue is addressed in vSphere 9.0 and above.

In case the vCenter Server is in 8.x or less, the below workaround shall reduce the workload on CNS service by letting it to catch up reading all the records and remove the stale volumes from DB.

Scale CSI controller deployment down to 0.
kubectl -n vmware-system-csi scale deployment vsphere-csi-controller --replicas=0
Wait for periodic sync in CNS to catch up reading all the records and remove the stale volumes from DB. The number of stale volumes can be monitored using below query.
select count(volume_id) from cns.volume_info where mark_for_delete=true;
Once the volumes marked for deletion are down to zero, scale back up the CSI controller.
kubectl -n vmware-system-csi scale deployment vsphere-csi-controller --replicas=3