While deploying up upgrading CaaS, you may run into an issue where your non-HA installation may fail to upgrade:
fatal: [node1]: FAILED! => {"changed": false, "msg": "Failed to drain node node1"}"
CaaS 2.12.0 and prior
UCF 25.4.5 and prior
Kubespray runs cordon and drain during upgrade (upgrade-cluster.yml) to do a rolling, node-by-node upgrade.
It runs when the node is Ready and schedulable (not when itβs NotReady).
Draining evacuates the node before kubelet/containerd are upgraded on that node, which is the intended and safe approach.
For the specifica example above with a 5 total node cluster:
One PDB (kafka-cluster-kafka) with minAvailable: 4 selects both broker and controller pods (5 pods total). On node3 there are two of those pods: broker-2 and controller-4.
Draining node3 would evict both β only 3 pods left β below minAvailable β PDB would be violated, kubectl drain correctly refuses and upgrade blocks.
1. Run these from the control node:
kubectl patch app "kafka-cluster" -n "platform" --type='merge' -p '{"spec": {"paused": true}}'
kubectl delete pdb kafka-cluster-kafka -n platform
kubectl delete kafka kafka-cluster -n platform
2. On the Deployment host edit /root/netops-25.4.x-xxxx/helm-chart/netops/values.yml and set global.highAvailibaility to false.
3. Now re-run your upgrade of CaaS.