The article will outline a likely cause for the mentioned errors and a resolution.
Symptoms:
kubectl get node
node/my-cluster-tkc-1-nodepool-1-a1bc-df23gh45ij-lmno Ready,SchedulingDisabled
kubectl get machine -n <namespace-name>
my-cluster-tkc-1-nodepool-1-a1bc-df23gh45ij-lmno my-cluster my-cluster-tkc-1-nodepool-1-a1bc-df23gh45ij-lmno vsphere://15151515-abcd-efgh-3737-ijklmno4848 Deleting
kubectl logs capi-controller-manager-<>-<> -n vmware-system-capw
kubectl logs capw-controller-manager-<>-<> -n vmware-system-capw
machine_controller.go:751] evicting pod <namespace>/<pod-name>-<>-<>
machine_controller.go:751] error when evicting pods/"<pod-name>-<>-<>" -n "<namespace>" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
kubectl get pdb -A
NAME MIN AVAILABLE MAX UNAVAILABLE ALLOWED DISRUPTIONS AGE
poddisruptionbudget.policy/<pdb-name> N/A 0 0 30m
kubectl get pods -n <namespace> -o wide | grep <pod-name>
NAMESPACE NAME READY STATUS RESTARTS AGE
<namespace> <pod-name>-<>-<> 1/1 RUNNING 6 (19d ago) 47d
kubectl get pdb -n <namespace-name>
NAME MIN AVAILABLE MAX UNAVAILABLE ALLOWED DISRUPTIONS AGE
poddisruptionbudget.policy/<pdb-name> N/A 0 0 30m
Notes:
The pod disruption budget (PDB) will need to be edited to allow for potential disruption or temporarily removed from the cluster until cluster node roll-out/replacement completes for all nodes in the cluster.
For either workaround option, it is recommended to first take a backup of the pod disruption budget.
These steps will need to be repeated for all pod disruption budgets with Allowed Disruptions of 0 in the cluster.
Workaround A: Edit the Pod Disruption Budget (PDB) to allow for potential disruption:
kubectl get pdb <pdb-name> -n <namespace> -o yaml > /PATH/MY-PDB-BACKUP.yaml
cat /PATH/MY-PDB-BACKUP.yaml | less
kubectl edit pdb <pdb-name> -n <namespace>
#Decrease .spec.minAvailable value or Increase .spec.maxUnavailable value
kubectl get pdb <pdb-name> -n <namespace>
kubectl drain <SchedulingDisabled-node-name>
kubectl edit pdb <pdb-name> -n <namespace>
#Revert the edited .spec.minAvailable value or .spec.maxUnavailable value
Workaround B: Temporarily remove the PDB from the cluster until cluster node roll-out/replacement completes for all nodes in the cluster.
kubectl get pdb <pdb-name> -n <namespace> -o yaml > /PATH/MY-PDB-BACKUP.yaml
cat /PATH/MY-PDB-BACKUP.yaml | less
kubectl delete pdb <pdb-name> -n <namespace>
kubectl drain <SchedulingDisabled-node-name>
kubectl apply -f /PATH/MY-PDB-BACKUP.yaml
kubectl get pdb <pdb-name> -n <namespace>
Official Kubernetes Documentation: https://kubernetes.io/docs/tasks/run-application/configure-pdb/
Impact/Risks:
Note: Before making any changes to the existing PDBs, please consult with the application/workload owner.