/var/log/cloud-init-output.log on the newly provisioned Control Plane node reveals the following failure during the etcd join phase:
error execution phase etcd-join: error creating local etcd static pod manifest file: etcdserver: too many learner members in cluster
etcdctl member list+-----------------+-----------+--------------------+---------------------------+---------------------------+------------+| ID | STATUS | NAME | PEER ADDRS | CLIENT ADDRS | IS LEARNER |+-----------------+-----------+--------------------+---------------------------+---------------------------+------------+| <etcd_member_id>| started | <etcd_member_name> | https://198.51.100.1:2380 | https://198.51.100.1:2379 | false || <etcd_member_id>| started | <etcd_member_name> | https://198.51.100.2:2380 | https://198.51.100.2:2379 | false || <etcd_member_id>| started | <etcd_member_name> | https://198.51.100.3:2380 | https://198.51.100.3:2379 | false || <etcd_member_id>| unstarted | | https://198.51.100.4:2380 | | true |+-----------------+-----------+--------------------+---------------------------+---------------------------+------------+Connect via SSH to a healthy Control Plane node of the affected guest cluster.
crictl ps --name etcd
alias etcdctl='crictl exec <etcd container id> etcdctl --cert /etc/kubernetes/pki/etcd/peer.crt --key /etc/kubernetes/pki/etcd/peer.key --cacert /etc/kubernetes/pki/etcd/ca.crt'
Identify the stale, unstarted learner member ID by running: etcdctl member list --write-out=table
Remove the stale learner member from the quorum: etcdctl member remove <stale_etcd_member_id>
Restart the etcd process on all healthy Control Plane nodes to refresh the quorum state. Identify the container ID and stop it (the kubelet will automatically restart it): crictl stop <Container_ID_of_etcd>
From the Supervisor Cluster context, delete the affected Machine object to trigger a clean cluster API rollout: kubectl delete machine <stuck_machine_name> -n <namespace>