CaaS / UCF: Failed to drain node
search cancel

CaaS / UCF: Failed to drain node

book

Article ID: 435202

calendar_today

Updated On:

Products

Network Observability

Issue/Introduction

While deploying up upgrading CaaS, you may run into an issue where your non-HA installation may fail to upgrade:

fatal: [node1]: FAILED! => {"changed": false, "msg": "Failed to drain node node1"}"

Environment

CaaS 2.12.0 and prior

UCF 25.4.5 and prior

Cause

Kubespray runs cordon and drain during upgrade (upgrade-cluster.yml) to do a rolling, node-by-node upgrade.
It runs when the node is Ready and schedulable (not when it’s NotReady).

Draining evacuates the node before kubelet/containerd are upgraded on that node, which is the intended and safe approach.

For the specifica example above with a 5 total node cluster:

One PDB (kafka-cluster-kafka) with minAvailable: 4 selects both broker and controller pods (5 pods total). On node3 there are two of those pods: broker-2 and controller-4.

Draining node3 would evict both β†’ only 3 pods left β†’ below minAvailable β†’ PDB would be violated, kubectl drain correctly refuses and upgrade blocks.

Resolution

1. Run these from the control node:

kubectl patch app "kafka-cluster" -n "platform" --type='merge' -p '{"spec": {"paused": true}}'
kubectl delete pdb kafka-cluster-kafka -n platform
kubectl delete kafka kafka-cluster -n platform

2. On the Deployment host edit /root/netops-25.4.x-xxxx/helm-chart/netops/values.yml and set global.highAvailibaility to false.

3. Now re-run your upgrade of CaaS.