Worker node stuck in Deleting state after successful drain – Waiting for volume detach
search cancel

Worker node stuck in Deleting state after successful drain – Waiting for volume detach

book

Article ID: 422044

calendar_today

Updated On:

Products

VMware vSphere Kubernetes Service

Issue/Introduction

After upgrading from vCenter 7.0 U3r → 8.0 U3a as part of the planned VCF 4.5.2 → 5.2.1 migration, the customer observed that one worker node x
remained stuck in the Deleting phase even after a successful drain operation.

During this upgrade, the Supervisor Cluster was upgraded from v1.26.8 → v1.27.5, and the control API migrated from CAPW to CAPV, changing certain resource ownership and CSI handling behavior.
The node status showed:

DrainingSucceeded=True
InfrastructureReady=True
VolumeDetachSucceeded=False
Reason=WaitingForVolumeDetach
Message=Waiting for node volumes to be detached

Environment

  • Product: vSphere with Tanzu / Tanzu Kubernetes Grid (Supervisor)

  • Versions:

    • vCenter: 8.0 U3a (upgraded from 7.0 U3r)

    • VCF: 5.2.1 (upgraded from 4.5.2)

    • Supervisor Cluster: v1.27.5 (upgraded from v1.26.8)

    • CAPW → CAPV migration performed (as part of Supervisor API alignment)

  • CSI Driver: csi.vsphere.vmware.com

  • Cluster Type: Workload Cluster (Guest Cluster / TKC)

Cause

Thus, the root cause was a missed Detach event handling in the CSI driver post-upgrade, leading to an orphaned VolumeAttachment finalizer that blocked cleanup.

Resolution


Manual Resolution Steps:

1.Verify in vCenter that the disk (UUID :yyyy) is not attached to any VM.

2. Identify the stuckVolumeAttachment:

 #kubectl get volumeattachments -A -o wide | grep csi-xxx

3.Edit the object and remove the finalizer: 
 #kubectl edit volumeattachments.storage.k8s.io csi-xxx

Below similar  lines need  below to be deleted:
finalizers:
- external-attacher/csi-vsphere-vmware-com

4. Validate that the object was deleted automatically:
 #kubectl get volumeattachments -A | grep csi-xxx
 #kubectl get nodes 
 

Additional Information

  • Once the finalizer was removed, Kubernetes immediately garbage-collected the orphaned VolumeAttachment.

  • The stuck worker node x  was automatically deleted by the Cluster API controller.

  • The cluster reconciled successfully — no pending Machine or VolumeAttachment objects remained.

  • Validation through kubectl get nodes confirmed that all nodes were in Ready state.