When creating a new vSphere Kubernetes Service (VKS) cluster or a new Pod in an existing cluster, the following issues may occur:
vCenter tasks fails with the following CNS error:CNSFault - ServerFaultCode: A general system error occurred: Too many outstanding operations
Once volume attach/detach operations begin to fail, the vSphere CSI driver issues a rapid burst of attach/detach requests to vCenter without any backoff or delay.
This continuous flood of operations fills the vCenter task queue, resulting in the Too many outstanding operations error for all subsequent CSI calls.
Even if the issue originates in a single VKS cluster, it can impact all workloads connected to the same vCenter Server.
vSphere Kubenetes Service 3.x
vSphere Supervisor on vSphere 8.x or 9.x
The issue can occur after one or more VKS nodes become inaccessible.
This issue occurs when the vSphere CSI Controller repeatedly attempts failed attach/detach operations without introducing a delay or backoff mechanism.
As a result, the vCenter task queue becomes saturated, and all subsequent operations fail with a generic “Too many outstanding operations” error.
To recover from this condition and restore normal operation, perform the following steps:
root@vcenter [ ~ ]# /usr/lib/vmware-wcp/decryptK8Pwd.py
Read key from file
Connected to PSQL
Cluster: domain-c#: <supervisor cluster domain id>
IP: <Supervisor FIP>
PWD: <password>
------------------------------------------------------------
kubectl scale deployment vsphere-csi-controller -n vmware-system-csi --replicas=0
kubectl annotate machine -n <ns> <machine-name> 'cluster.x-k8s.io/remediate-machine=""'
kubectl delete machine -n <ns> <machine-name>
kubectl patch cnsnodevmattachment <attachment-name> -n <namespace> -p '{"metadata":{"finalizers":[]}}' --type=merge
service-control --stop vmware-vpxd
service-control --start vmware-vpxd
kubectl scale deployment vsphere-csi-controller -n vmware-system-csi --replicas=2