TKGi upgrade fails during worker node's drain/pre-stop phase

Products

VMware Tanzu Kubernetes Grid Integrated Edition

Issue/Introduction

A TKGi cluster upgrade sometimes can get stuck in a worker node's drain/pre-stop phase as follows:

Task <task-id> | 15:42:13 | L executing pre-stop: worker/<worker-id> (0) (canary)

Checking the status of the node will show that it's being drained with status "Ready,SchedulingDisabled":

# kubectl get node
NAME STATUS ROLES AGE VERSION
<node-name> Ready,SchedulingDisabled <none> 45h v1.29.6+vmware.1

Some errors that can be found in the node's "/var/vcap/sys/log/kubelet/drain.stderr.log" are:

Cannot evict pod as it would violate the pod's disruption budget

or

Kill container failed. context deadline exceeded

This article outlines some troubleshooting steps to take to identify and resolve the issue.

Cause

There're many possible causes why the upgrade can get stuck in a worker node's drain/pre-stop phase.

Some of them include:

Existence of PodDisruptionBudgets.
Orphaned container in the worker node.

Resolution

Log into the worker node:
# bosh -d <service-instance_id> ssh <worker-node_id>
Check "/var/vcap/sys/log/kubelet/drain.stderr.log".
You may see there "Cannot evict pod as it would violate the pod's disruption budget." error messages.
This error indicates there's a PodDisruptionBudget (PDB) object in your cluster blocking the drain process.

Check the PDB objects in your cluster:
# kubectl get pdb -A
NAME MIN AVAILABLE MAX UNAVAILABLE ALLOWED DISRUPTIONS AGE
<pdb-name> 2 N/A 0 7s

If you see a PDB with "Allowed Disruptions" equal to 0, then most likely it is blocking the drain process as it won't allow any pod it's targeting to be evicted.

To solve the issue, you can follow the steps mentioned in Worker Node Hangs Indefinitely.
If no PDB errors are seen in "/var/vcap/sys/log/kubelet/drain.stderr.log", you may see "Kill container failed. context deadline exceeded" errors.

Check the pods running in the node:
# kubectl get po -A -owide | grep <node-name-from-kubectl-get-node-command>

You should only see there pods belonging to DaemonSet controllers. If there're pods running that don't belong to a DaemonSet, delete them manually (note that deleting any pod that doesn't belong to a controller object, i.e. Deployment, ReplicaSet, StatefulSet, Job, will result in the pod being completely deleted and unrecoverable):
# kubectl delete po <pod-name> -n <namespace> --force

After deleting all the unexpectedly running pods, check if there're any orphaned containers in the worker node:
# bosh -d <service-instance_id> ssh <worker-node_id>
# crictl ps -a

Look for containers in Running State that aren't part of a DaemonSet.

If there're no unexpected containers, run the TKGi cluster upgrade again:
# tkgi upgrade-cluster <cluster-name>

If there're unexpected containers without an associated Kubernetes pod, most likely they're orphaned containers stored in the worker node's cache.
Try to kill the containers manually:
# crictl stop <container-id> --force
# crictl rm <container-id>

If it doesn't work, reboot the worker VM, either from CLI (# reboot now) or from vCenter.
This will clean up the cache in the node.

After the reboot, check again for unexpected Running containers:
# crictl ps -a

If no unexpected containers are shown, run the TKGi cluster upgrade again:
# tkgi upgrade-cluster <cluster-name>
If the above steps don't solve the issue, please open a Support Request with Tanzu Support.