vSphere Kubernetes Cluster Node Stuck Deleting due to PodDisruptionBudget (PDB)

Products

VMware vSphere ESXi VMware vSphere with Tanzu

Issue/Introduction

The article will outline a likely cause for the mentioned errors and a resolution.

Symptoms:

vSphere Kubernetes cluster's nodes stuck in Ready,SchedulingDisabled status after a Machine deletion order has been triggered, for example during a TKC upgrade (roll-out) or scaling operation:

kubectl get node

node/my-cluster-tkc-1-nodepool-1-a1bc-df23gh45ij-lmno Ready,SchedulingDisabled

Associated Machine is stuck in Deleting status from the Supervisor cluster:

kubectl get machine -n <namespace-name>

my-cluster-tkc-1-nodepool-1-a1bc-df23gh45ij-lmno my-cluster my-cluster-tkc-1-nodepool-1-a1bc-df23gh45ij-lmno vsphere://15151515-abcd-efgh-3737-ijklmno4848 Deleting

capi-controller-manager/capw-controller-manager logs on the Supervisor cluster show errors similar to the below:

kubectl logs capi-controller-manager-<>-<> -n vmware-system-capw

kubectl logs capw-controller-manager-<>-<> -n vmware-system-capw

machine_controller.go:751] evicting pod <namespace>/<pod-name>-<>-<>

machine_controller.go:751] error when evicting pods/"<pod-name>-<>-<>" -n "<namespace>" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.

There is a pod disruption budget with 0 Allowed Disruptions in the vSphere Kubernetes cluster matching the above noted pod:

kubectl get pdb -A

NAME MIN AVAILABLE MAX UNAVAILABLE ALLOWED DISRUPTIONS AGE

poddisruptionbudget.policy/<pdb-name> N/A 0 0 30m

The above noted pod is present on the node in SchedulingDisabled and Deleting state:

kubectl get pods -n <namespace> -o wide | grep <pod-name>

NAMESPACE NAME READY STATUS RESTARTS AGE

<namespace> <pod-name>-<>-<> 1/1 RUNNING 6 (19d ago) 47d

Environment

VMware vSphere 7.0 with Tanzu

VMware vSphere 8.0 with Tanzu

This issue can occur on vSphere Kubernetes cluster regardless of whether or not they are managed by Tanzu Mission Control (TMC)

Cause

The error "Cannot evict pod as it would violate the pod's disruption budget." indicates that there is a PodDisruptionBudget (PDB) applied to a workload/pod in the TKC cluster which is blocking the node from draining and gracefully deleting.
When a Machine object is issued a command to be replaced, the controllers will automatically cordon then drain the Kubernetes node and move/evict the workloads/pods to other available nodes/Machines.
If a PodDisruptionBudget (PDB) associated to one of the workloads/pods exists with zero "Allowed Disruptions", then it will block the node's draining, leaving it in Ready,SchedulingDisabled status because the PDB configuration has been set to never bring down all replicas of this workload/pod at any given point.

kubectl get pdb -n <namespace-name>

NAME MIN AVAILABLE MAX UNAVAILABLE ALLOWED DISRUPTIONS AGE

poddisruptionbudget.policy/<pdb-name> N/A 0 0 30m

Notes:

Important: Manual deletion of a node stuck in Deleting state is not advised. It is best to determine the cause of deletion to unblock the Kubernetes drain to gracefully clean up the node.
Manually deleting a node object differs from deleting a Machine object in that it's not graceful. It won't cordon the node (cordon prevents new pods from scheduling on the node) and it will not gracefully evict the workloads to other available nodes. This can cause issues with the volumes attached to pods that were running on this deleted node.

Resolution

The pod disruption budget (PDB) will need to be edited to allow for potential disruption or temporarily removed from the cluster until cluster node roll-out/replacement completes for all nodes in the cluster.

For either workaround option, it is recommended to first take a backup of the pod disruption budget.

These steps will need to be repeated for all pod disruption budgets with Allowed Disruptions of 0 in the cluster.

Workaround A: Edit the Pod Disruption Budget (PDB) to allow for potential disruption:

Take a backup of the pod disruption budget:

kubectl get pdb <pdb-name> -n <namespace> -o yaml > /PATH/MY-PDB-BACKUP.yaml

Confirm that the backup contains the expected PDB YAML:
- ```
cat /PATH/MY-PDB-BACKUP.yaml | less
```

Edit the pod disruption budget to either decrease the minAvailable value or increase the maxUnavailable value:

kubectl edit pdb <pdb-name> -n <namespace>

#Decrease .spec.minAvailable value or Increase .spec.maxUnavailable value

Confirm that the edited pod disruption budget no longer shows Allowed Disruptions 0:
- ```
kubectl get pdb <pdb-name> -n <namespace>
```
Confirm that the SchedulingDisabled node is now able to drain properly by manually initiating drain:
- ```
kubectl drain <SchedulingDisabled-node-name>
```
After the cluster node roll-out/replacement completes for all nodes in the cluster, the PDB can be edited to its previous values:
- ```
kubectl edit pdb <pdb-name> -n <namespace>

#Revert the edited .spec.minAvailable value or .spec.maxUnavailable value
```

Workaround B: Temporarily remove the PDB from the cluster until cluster node roll-out/replacement completes for all nodes in the cluster.

Take a backup of the pod disruption budget:

kubectl get pdb <pdb-name> -n <namespace> -o yaml > /PATH/MY-PDB-BACKUP.yaml

Confirm that the backup contains the expected PDB YAML:
- ```
cat /PATH/MY-PDB-BACKUP.yaml | less
```
Perform a kubectl delete on the pod disruption budget
- ```
kubectl delete pdb <pdb-name> -n <namespace>
```
Confirm that the SchedulingDisabled node is now able to drain properly by manually initiating drain:
- ```
kubectl drain <SchedulingDisabled-node-name>
```
After the cluster node roll-out/replacement completes for all nodes in the cluster, the PDB can be restored using the backup:
- ```
kubectl apply -f /PATH/MY-PDB-BACKUP.yaml
```
Confirm that the pdb was restored:

kubectl get pdb <pdb-name> -n <namespace>

Additional Information

Official Kubernetes Documentation: https://kubernetes.io/docs/tasks/run-application/configure-pdb/

Impact/Risks:

Note: Before making any changes to the existing PDBs, please consult with the application/workload owner.