After upgrading from vCenter 7.0 U3r → 8.0 U3a as part of the planned VCF 4.5.2 → 5.2.1 migration, the customer observed that one worker node x
remained stuck in the Deleting phase even after a successful drain operation.
During this upgrade, the Supervisor Cluster was upgraded from v1.26.8 → v1.27.5, and the control API migrated from CAPW to CAPV, changing certain resource ownership and CSI handling behavior.
The node status showed:
DrainingSucceeded=True
InfrastructureReady=True
VolumeDetachSucceeded=False
Reason=WaitingForVolumeDetach
Message=Waiting for node volumes to be detached
Product: vSphere with Tanzu / Tanzu Kubernetes Grid (Supervisor)
Versions:
vCenter: 8.0 U3a (upgraded from 7.0 U3r)
VCF: 5.2.1 (upgraded from 4.5.2)
Supervisor Cluster: v1.27.5 (upgraded from v1.26.8)
CAPW → CAPV migration performed (as part of Supervisor API alignment)
CSI Driver: csi.vsphere.vmware.com
Cluster Type: Workload Cluster (Guest Cluster / TKC)
Thus, the root cause was a missed Detach event handling in the CSI driver post-upgrade, leading to an orphaned VolumeAttachment finalizer that blocked cleanup.
Manual Resolution Steps:
1.Verify in vCenter that the disk (UUID :yyyy) is not attached to any VM.
2. Identify the stuckVolumeAttachment:
#kubectl get volumeattachments -A -o wide | grep csi-xxx
3.Edit the object and remove the finalizer:
#kubectl edit volumeattachments.storage.k8s.io csi-xxx
Below similar lines need below to be deleted:
finalizers:
- external-attacher/csi-vsphere-vmware-com
4. Validate that the object was deleted automatically:
#kubectl get volumeattachments -A | grep csi-xxx
#kubectl get nodes
Once the finalizer was removed, Kubernetes immediately garbage-collected the orphaned VolumeAttachment.
The stuck worker node x was automatically deleted by the Cluster API controller.
The cluster reconciled successfully — no pending Machine or VolumeAttachment objects remained.
Validation through kubectl get nodes confirmed that all nodes were in Ready state.