Unable to attach PV with error "ServerFaultCode: NotAuthenticated" and pods are stuck in ContainerCreating

search cancel

Unable to attach PV with error "ServerFaultCode: NotAuthenticated" and pods are stuck in ContainerCreating

book

Article ID: 368743

calendar_today

Updated On:

Products

vSphere with Tanzu

Issue/Introduction

Unable to attach PV getting error "ServerFaultCode: NotAuthenticated"
Pods trying to attach PV are getting stuck in "ContainerCreating"
VPXD logs on vCenter Server will show below errors - /var/log/vmware/vpxd/vpxd.log

info vpxd[08547] [Originator@6876 sub=VmCustomizer opID=vmoperator-xx-cluster-workers-<opID>] hostVersion = 7.0.3, Tools version = 11360
warning vpxd[08547] [Originator@6876 sub=vmomi.soapStub[108135] opID=vmoperator-xx-cluster-workers-<opID> SOAP request returned HTTP failure; <SSL(<io_obj p:2a78, h:145, <TCP '<IP_address> : 48106'>, <TCP '<IP_address> : 443'>>), /sdk>, method: systemManagement; code: 500(Internal Server Error)
info vpxd[08547] [Originator@6876 sub=Vmomi opID=vmoperator-xx-cluster-workers-<opID>] Retry SOAP call after exception; <<last binding: <<TCP '<IP_address>: 55394'>, <TCP
'<IP_address>: 443'>>>, /sdk>, vim.NfcService.systemManagement, N3Vim5Fault16NotAuthenticated9ExceptionE(Fault cause: vim.fault.NotAuthenticated
--> )
--> [context]------[/context]
info vpxd[08547] [Originator@6876 sub=Vmomi opID=vmoperator-xx-cluster-workers-<opID>] Stale SOAP session to host <ESXi_FQDN>; reinitializing
info vpxd[08547] [Originator@6876 sub=Vmomi opID=vmoperator-xx-cluster-workers-<opID> Creating SOAP stub adapter for /sdk on <ESXi_FQDN>:443

CSI Syncer logs on Supervisor Cluster will show below errors /var/log/pods/vmware-system-csi_vsphere-csi-controller-<ID>/vsphere-syncer1.log

stderr F {"level":"info","time":"<Date/Time>","caller":"syncer/metadatasyncer.go:427","msg":"CSI full sync failed with error: ServerFaultCode: NotAuthenticated","TraceId":"<UUID>"}
stderr F I0412 05:33:18.486950 1 request.go:668] Waited for 1.001530599s due to client-side throttling, not priority and fairness, request: GET:https://127.0.0.1:6443/api/v1/namespaces/devk8s/persistentvolumeclaims/<UUID>
stderr F {"level":"info","time":"<Date/Time>","caller":"cnsnodevmattachment/cnsnodevmattachment_controller.go:209","msg":"Reconciling CnsNodeVmAttachment with Request.Name: \"<guest_cluster_worker_node_name>-containerd\" instance \"<guest_cluster_worker_node_name>-containerd\" timeout \"1s\" seconds","TraceId":"<UUID>"}
stderr F {"level":"info","time":"<Date/Time>","caller":"cnsnodevmattachment/cnsnodevmattachment_controller.go:335","msg":"vSphere CSI driver is attaching volume: \"<UUID>\" to nodevm: VirtualMachine:vm-ID [VirtualCenterHost: <ESXi_FQDN>, UUID: <UUID>, Datacenter: Datacenter [Datacenter: Datacenter:datacenter-<ID>, VirtualCenterHost: <ESXi_FQDN>]] for CnsNodeVmAttachment request with name: \"<guest_cluster_worker_node_name>-containerd\" on namespace: \"stgk8s\"","TraceId":"<UUID>"}

Environment

vSphere with Tanzu 8.x
vSphere with Tanzu 7.x

Resolution

This issue is resolved in CSI driver in v2.7.2 version which is available in vCenter Server 8.0U2 or later version. Upgrade the Supervisor version post upgrading the VC to 8.0U2.
For vCenter Server 7.x there is no fix, please follow below workaround:

Workaround:
- Restart the csi controller pod using below command:

kubectl rollout restart deployment vsphere-csi-controller -n vmware-system-csi

Note: This step will help to recover the CSI controller temporarily, you will have to restart the CSI controller deployment when you face this issue.

Feedback

thumb_up Yes

thumb_down No