Symptoms:
- Pods would be stuck in Init or ContainerCreating states.
- Describe pod shows the error 'volume' is in use:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedAttachVolume 43m attachdetach-controller AttachVolume.Attach failed for volume "pvc-########-###-#####-###-#########" : rpc error: code = Internal desc = failed to attach disk: "#########-####-####-####-#######" with node: "workload-###-#-#######-#####" err failed to attach cns volume: "
#########-####-####-####-#######
" to node vm: "VirtualMachine:vm-###### [VirtualCenterHost: ******, UUID: ########-####-####-####-########, Datacenter: ***** [Datacenter: Datacenter:datacenter-##, VirtualCenterHost: *******]]". fault: "(*types.LocalizedMethodFault)(0xc000a70c60)({\n DynamicData: (types.DynamicData) {\n },\n Fault: (*types.ResourceInUse)(0xc000ede640)({\n VimFault: (types.VimFault) {\n MethodFault: (types.MethodFault) {\n FaultCause: (*types.LocalizedMethodFault)(<nil>),\n FaultMessage: ([]types.LocalizableMessage) <nil>\n }\n },\n Type: (string) \"\",\n Name: (string) (len=6) \"volume\"\n }),\n LocalizedMessage: (string) (len=32) \"The resource 'volume' is in use.\"\n})\n". opId: "086a91d5"Warning FailedMount 87s (x1769 over 2d18h) kubelet Unable to attach or mount volumes: unmounted volumes=[data], unattached volumes=[data]: timed out waiting for the condition
- logs of csi-attacher container (part of csi-controller pod) shows the error 'NoPermission':
I1111 04:45:24.730061 1 controller.go:165] Ignoring VolumeAttachment "csi-##############################################################" change
I1111 04:45:24.730066 1 csi_handler.go:624] Saved detach error to "csi-
##############################################################
"I1111 04:45:24.730097 1 csi_handler.go:231] Error processing "csi-
##############################################################
": failed to detach: rpc error: code = Internal desc = queryVolume failed for volumeID: "########-####-####-####-#########" with err=ServerFaultCode: NoPermission
Impact/Risks:
Any pod that uses a PV that is still attached to a source node will not start, describe pod will show the error "The resource 'volume' is in use."
Tanzu Kubernetes Grid
vSphere with Tanzu.
The issue arises because the TKG role assigned to the TKG user doesn't have sufficient vSphere permissions to detach the virtual disk (VMDK),
causing it to stay connected to the original node rather than detaching as intended.
Update the TKG role by adding the Cns.Searchable permission, as outlined in the "Required Permissions for the vSphere Account" section of the documentation.
Prepare to Deploy Management Clusters to vSphere
- For a pod to successfully have a PV attached to it, the virtual disk (vmdk) has to be mounted to the worker node where the pod is running.
- If the pod stops, the vmdk is detached from the node.
- If the pod starts again (either on the same or another node), the vmdk is attached again to the new node.