Tanzu Kubernetes Cluster Stuck in Updating State Due to Missing Control Plane Node
book
Article ID: 404504
calendar_today
Updated On:
Products
Tanzu Kubernetes Runtime
Issue/Introduction
A Tanzu Kubernetes Cluster was observed in an Updating phase and reported as Ready=False. Validation checks indicated the cluster was not in the Running phase. One of three control plane nodes is missing.
Environment
Tanzu Kubernetes Runtime
Cause
Although the Machine, WCPMachine, and VM objects for the node still existed, a node was absent from both the Kubernetes node list and the etcd member list. The cluster was unable to recover automatically because the capi controllers appeared to be in a frozen state. The KubeadmControlPlane (KCP) resource reported a ControlPlaneComponentsUnknown condition, listing the remaining control plane nodes as having unknown status.
Resolution
Identify the missing control plane node by comparing the expected control plane nodes against the list of nodes visible in Kubernetes and etcd.
Restart the following controllers on the Supervisor Cluster to trigger reconciliation:
capi-kubeadm-control-plane-controller-manager
capi-controller-manager
If it doesn't happen automatically, delete the Machine object associated with the missing control plane node. This deletion should cascade and remove the corresponding WCPMachine and VM resources.
After restart, confirm that the control plane node is re-created and joins etcd successfully.
Validate that the KubeadmControlPlane resource reports healthy status and that the cluster transitions to Ready=True.