Pods Fail to Start After Worker Re-Creation Due to CSI Trident Volume Mount Permission Error

search cancel

Pods Fail to Start After Worker Re-Creation Due to CSI Trident Volume Mount Permission Error

book

Article ID: 418802

calendar_today

Updated On:

Products

Tanzu Kubernetes Runtime

Issue/Introduction

After a Kubernetes worker node is re-created (for example, by the BOSH resurrector), pods scheduled on the new node fail to start and remain in ContainerCreating or CreateContainerError state.

Pod events and node logs show repeated mount-related errors similar to:

MountVolume.SetUp failed for volume "<pvc-id>": rpc error: code = Internal desc = could not check if the target path (...) is a directory: permission denied

From the worker node logs:

containerd

failed to stat "/var/vcap/data/kubelet/pods/<podUID>/volumes/kubernetes.io~csi/<pvc-id>/mount": permission denied

kubelet

CreateContainerError: failed to generate container spec: failed to stat ".../kubernetes.io~csi/<pvc-id>/mount": permission denied

The issue only affects pods scheduled on the newly recreated worker node. Pods running on existing workers continue to operate normally.

Environment

Kubernetes cluster using Trident CSI with NFS-backed PersistentVolumes

Cause

The issue is caused by a node-local filesystem permission problem affecting the CSI volume target path under the kubelet directory on the newly recreated worker node.

Specifically:

Trident CSI successfully processes the volume request at the control-plane level.
During pod startup, kubelet and containerd attempt to stat() the CSI volume target path:
/var/vcap/data/kubelet/pods/<podUID>/volumes/kubernetes.io~csi/<pvc-id>/mount
The operation fails with permission denied before any NFS mount is attempted.
No corresponding errors are observed on the NFS server or in vCenter for the affected VM.

This indicates the failure occurs on the worker node filesystem, prior to volume mount, rather than during disk attachment or NFS export access.

Resolution

Recreating the affected PersistentVolumeClaim (PVC) resolves the issue.

Recreating the PVC causes Kubernetes to:

Allocate a new pod UID and CSI target path
Avoid the stale or inaccessible directory created during the initial failed attempt
Successfully mount the NFS volume on the worker node

After PVC recreation, pods start successfully without further intervention.

Additional Information

No errors were observed on the NFS backend or in vCenter for the affected VM.
Trident logs show repeated retries failing during a pre-mount directory check, not during NFS mount operations.
Worker node logs (kubelet and containerd) confirm the failure occurs before container startup due to filesystem permission denial.
Because the issue originates within the CSI volume handling on the worker node, engaging Trident support is recommended for deeper analysis and guidance.

Feedback

thumb_up Yes

thumb_down No