Symptoms:
- When using the vSphere CSI driver in a multi cluster Tanzu Kubernetes Grid Integrated Edition (TKGI) environment, pods start failing to attach or detach volumes.
- You see messages similar to the following in the events for the affected pods:
Warning FailedMount 25m (x2145 over 16d) kubelet, cfbfhbeb-a32b-48df-8d30-94562da4701f Unable to attach or mount volumes: unmounted volumes=[myvolume], unattached volumes=[myvolume]: timed out waiting for the condition
Warning FailedAttachVolume 19m attachdetach-controller AttachVolume.Attach failed for volume "pvc-21ad222f-ffa2-123a-abcf-edfabc456231" : rpc error: code = Internal desc = failed to attach disk: "432567fa-abcd-4449-5674-765432abccfb" with node: "cffebaeb-a20a-41a0-89b0-95617c64701f" err failed to attach cns volume: "432567fa-abcd-4449-5674-765432abccfb" to node vm: "VirtualMachine:vm-172 [VirtualCenterHost: 10.237.25.10, UUID: 4237a733-67c3-8130-702c-63f9383289ba, Datacenter: Datacenter [Datacenter: Datacenter:datacenter-21, VirtualCenterHost: 10.237.25.10]]". fault: "(*types.LocalizedMethodFault)(0xc0007df960)({\n DynamicData: (types.DynamicData) {\n },\n Fault: (types.CnsFault) {\n BaseMethodFault: (types.BaseMethodFault) <nil>,\n Reason: (string) (len=79) \"CNS: The input volume 432567fa-abcd-4449-5674-765432abccfb is not a CNS volume.\"\n },\n LocalizedMessage: (string) (len=95) \"CnsFault error: CNS: The input volume 432567fa-abcd-4449-5674-765432abccfb is not a CNS volume.\"\n})\n". opId: "074be712"
- You see messages similar to the following in the kube-controller logs on the control plane nodes:
E0616 14:11:07.519321 11 nestedpendingoperations.go:301] Operation for "{volumeName:kubernetes.io/csi/csi.vsphere.vmware.com^abcde123-1456-bcde-5643-aaa5674433f podName: nodeName:}" failed. No retries permitted until 2021-06-16 14:11:08.019297747 +0000 UTC m=+3981445.422084190 (durationBeforeRetry 500ms). Error: "AttachVolume.Attach failed for volume \"pvc-21ad222f-ffa2-123a-abcf-edfabc456231\" (UniqueName: \"kubernetes.io/csi/csi.vsphere.vmware.com^abcde123-1456-bcde-5643-aaa5674433f\") from node \"e13e7d0b-464f-43ba-a130-271d14e3c107\" : rpc error: code = Aborted desc = pending"
I0616 14:11:07.519367 11 event.go:278] Event(v1.ObjectReference{Kind:"Pod", Namespace:"mynamespace", Name:"mypod", UID:"dfe86621-5731-48f6-8814-b41d5318c32f", APIVersion:"v1", ResourceVersion:"80991861", FieldPath:""}): type: 'Warning' reason: 'FailedAttachVolume' AttachVolume.Attach failed for volume "pvc-21ad222f-ffa2-123a-abcf-edfabc456231" : rpc error: code = Aborted desc = pending
W0616 14:11:07.521828 11 reconciler.go:206] attacherDetacher.DetachVolume started for volume "pvc-21ad222f-ffa2-123a-abcf-edfabc456231" (UniqueName: "kubernetes.io/csi/csi.vsphere.vmware.com^abcde123-1456-bcde-5643-aaa5674433f") on node "cffebaeb-a20a-41a0-89b0-95617c64701f" This volume is not safe to detach, but maxWaitForUnmountDuration 6m0s expired, force detaching
I0616 14:11:07.521874 11 reconciler.go:275] attacherDetacher.AttachVolume started for volume "pvc-27ea3883-d20a-4dbd-82ea-f346048c988c" (UniqueName: "kubernetes.io/csi/csi.vsphere.vmware.com^12345abc-abcd-sf67-8904-abc3217865c9b") from node "cffebaeb-a20a-41a0-89b0-95617c64701f"
E0616 14:11:07.668770 11 csi_attacher.go:662] kubernetes.io/csi: detachment for VolumeAttachment for volume [abcde123-1456-bcde-5643-aaa5674433f] failed: rpc error: code = Internal desc = volumeID "abcde123-1456-bcde-5643-aaa5674433f" not found in QueryVolume
E0616 14:11:07.668869 11 nestedpendingoperations.go:301] Operation for "{volumeName:kubernetes.io/csi/csi.vsphere.vmware.com^abcde123-1456-bcde-5643-aaa5674433f podName: nodeName:}" failed. No retries permitted until 2021-06-16 14:11:08.168818196 +0000 UTC m=+3981445.571604639 (durationBeforeRetry 500ms). Error: "DetachVolume.Detach failed for volume \"pvc-21ad222f-ffa2-123a-abcf-edfabc456231\" (UniqueName: \"kubernetes.io/csi/csi.vsphere.vmware.com^abcde123-1456-bcde-5643-aaa5674433f\") on node \"cffebaeb-a20a-41a0-89b0-95617c64701f\" : rpc error: code = Internal desc = volumeID \"abcde123-1456-bcde-5643-aaa5674433f\" not found in QueryVolume"