Guest cluster upgrade does not complete with worker node machine in status deleting
Login to the Supervisor:
root@############ [ ~ ]# k get machine -n ########################-2xlarge-pool-1-############-bnzld ############ ############-2xlarge-pool-1-############-bnzld vsphere://############ Deleting 413d v1.28.8+vmware.1-fips.1############-2xlarge-pool-1-############ ############ ############-2xlarge-pool-1-############ vsphere://############ Running 22m v1.29.4+vmware.3-fips.1############-2xlarge-pool-2-############ ############ ############-2xlarge-pool-2-############ vsphere://############ Running 413d v1.28.8+vmware.1-fips.1############-############-58vjg ############ ############-nstjh-############ vsphere://############ Running 153m v1.29.4+vmware.3-fips.1############-############-lf9gp ############ ############-nstjh-############ vsphere://############ Running 25m v1.29.4+vmware.3-fips.1############-############-n6b4f ############ ############-nstjh-############ vsphere://############ Running 28m v1.29.4+vmware.3-fips.1
k describe machine ############-2xlarge-pool-1-############-bnzld -n ############
Conditions:Last Transition Time: 2025-07-01T14:22:01ZMessage:Observed Generation: 4Reason: NotReadyStatus: FalseType: AvailableLast Transition Time: 2025-11-12T10:40:34ZMessage: * Deleting: Machine deletion in progress since more than 15m, stage: DrainingNode, delay likely due to Pods not terminating
Affected guest cluster worker node is in status Ready,SchedulingDisabled
root@############-############-58vjg [ ~ ]# k get nodesNAME STATUS ROLES AGE VERSION############-2xlarge-pool-1-############-bnzld Ready,SchedulingDisabled <none> 413d v1.28.8+vmware.1-fips.1############-2xlarge-pool-1-############ Ready <none> 31m v1.29.4+vmware.3-fips.1############-2xlarge-pool-2-############ Ready <none> 413d v1.28.8+vmware.1-fips.1############-############-58vjg Ready control-plane 162m v1.29.4+vmware.3-fips.1############-############-lf9gp Ready control-plane 34m v1.29.4+vmware.3-fips.1############-############-n6b4f Ready control-plane 38m v1.29.4+vmware.3-fips.1root@############-############-58vjg [ ~ ]#
Confirm that there is an application pod on that affected worker node that cannot be terminated:
kubectl get pods -A -o wide | grep -iv running
############ ############-############ 0/2 Terminating 0 99d ############ ############-2xlarge-pool-############-bnzld <none> <none>
vCenter 8.0 U3
Machine object cannot be deleted when there is a pod in status terminating
Investigate why the application pod cannot be terminated
There might be a PodDisruptionBudget active which is preventing the pod to terminate:
Workaround:
If you can recreate the pod manually or the system will recreate the pod automatically on the new deployed worker node you can delete the application pod in terminating state with parameter --force
example
k delete pod <pod name> -n <namespace name> --force