Guest cluster upgrade stuck with worker node in status deleting and error "Deleting: Machine deletion in progress since more than 15m, stage: DrainingNode, delay likely due to Pods not terminating"

search cancel

Guest cluster upgrade stuck with worker node in status deleting and error "Deleting: Machine deletion in progress since more than 15m, stage: DrainingNode, delay likely due to Pods not terminating"

book

Article ID: 418047

calendar_today

Updated On:

Products

VMware vSphere Kubernetes Service

Issue/Introduction

Guest cluster upgrade does not complete with worker node machine in status deleting

root@############ [ ~ ]# k get machine -n ############
############-2xlarge-pool-1-############-bnzld ############ ############-2xlarge-pool-1-############-bnzld vsphere://############ Deleting 413d v1.28.8+vmware.1-fips.1
############-2xlarge-pool-1-############ ############ ############-2xlarge-pool-1-############ vsphere://############ Running 22m v1.29.4+vmware.3-fips.1
############-2xlarge-pool-2-############ ############ ############-2xlarge-pool-2-############ vsphere://############ Running 413d v1.28.8+vmware.1-fips.1
############-############-58vjg ############ ############-nstjh-############ vsphere://############ Running 153m v1.29.4+vmware.3-fips.1
############-############-lf9gp ############ ############-nstjh-############ vsphere://############ Running 25m v1.29.4+vmware.3-fips.1
############-############-n6b4f ############ ############-nstjh-############ vsphere://############ Running 28m v1.29.4+vmware.3-fips.1

k describe machine ############-2xlarge-pool-1-############-bnzld -n ############

Conditions:
Last Transition Time: 2025-07-01T14:22:01Z
Message:
Observed Generation: 4
Reason: NotReady
Status: False
Type: Available
Last Transition Time: 2025-11-12T10:40:34Z
Message: * Deleting: Machine deletion in progress since more than 15m, stage: DrainingNode, delay likely due to Pods not terminating

Affected guest cluster worker node is in status Ready,SchedulingDisabled

root@############-############-58vjg [ ~ ]# k get nodes
NAME STATUS ROLES AGE VERSION
############-2xlarge-pool-1-############-bnzld Ready,SchedulingDisabled <none> 413d v1.28.8+vmware.1-fips.1
############-2xlarge-pool-1-############ Ready <none> 31m v1.29.4+vmware.3-fips.1
############-2xlarge-pool-2-############ Ready <none> 413d v1.28.8+vmware.1-fips.1
############-############-58vjg Ready control-plane 162m v1.29.4+vmware.3-fips.1
############-############-lf9gp Ready control-plane 34m v1.29.4+vmware.3-fips.1
############-############-n6b4f Ready control-plane 38m v1.29.4+vmware.3-fips.1
root@############-############-58vjg [ ~ ]#

Confirm that there is an application pod on that affected worker node that cannot be terminated:

kubectl get pods -A -o wide | grep -iv running

############ ############-############ 0/2 Terminating 0 99d ############ ############-2xlarge-pool-############-bnzld <none> <none>

Environment

vCenter 8.0 U3

Cause

Machine object cannot be deleted when there is a pod in status terminating

Resolution

Investigate why the application pod cannot be terminated

There might be a PodDisruptionBudget active which is preventing the pod to terminate:

vSphere Supervisor Workload Cluster Upgrade Stuck due to Node Stuck Deleting caused by PodDisruptionBudget (PDB)

Workaround:

If you can recreate the pod manually or the system will recreate the pod automatically on the new deployed worker node you can delete the application pod in terminating state with parameter --force

example

k delete pod <pod name> -n <namespace name> --force

Feedback

thumb_up Yes

thumb_down No