Tanzu Kubernetes Cluster Stuck in Updating State Due to Missing Control Plane Node
search cancel

Tanzu Kubernetes Cluster Stuck in Updating State Due to Missing Control Plane Node

book

Article ID: 404504

calendar_today

Updated On:

Products

Tanzu Kubernetes Runtime

Issue/Introduction

A Tanzu Kubernetes Cluster was observed in an Updating phase and reported as Ready=False. Validation checks indicated the cluster was not in the Running phase. One of three control plane nodes is missing. 

Environment

Tanzu Kubernetes Runtime

Cause

Although the Machine, WCPMachine, and VM objects for the node still existed, a node was absent from both the Kubernetes node list and the etcd member list. The cluster was unable to recover automatically because the capi controllers appeared to be in a frozen state. The KubeadmControlPlane (KCP) resource reported a ControlPlaneComponentsUnknown condition, listing the remaining control plane nodes as having unknown status.

Resolution

  1. Identify the missing control plane node by comparing the expected control plane nodes against the list of nodes visible in Kubernetes and etcd.
  2. Restart the following controllers on the Supervisor Cluster to trigger reconciliation:
    • capi-kubeadm-control-plane-controller-manager
    • capi-controller-manager
  3. If it doesn't happen automatically, delete the Machine object associated with the missing control plane node. This deletion should cascade and remove the corresponding WCPMachine and VM resources.
  4. After restart, confirm that the control plane node is re-created and joins etcd successfully.
  5. Validate that the KubeadmControlPlane resource reports healthy status and that the cluster transitions to Ready=True.