In a vSphere Kubernetes Cluster or Supervisor cluster, one or more control planes are missing the expected roles.
Please confirm on which context and cluster is affected and check the appropriate section's steps:
While connected to the Supervisor cluster context, the following symptoms are present:
kubectl get nodes
NAME STATUS ROLES
<supervisor-vm-dns-name-1> READY control-plane
<supervisor-vm-dns-name-2> READY <none>
<supervisor-vm-dns-name-3> READY master
kubectl get pods -A | grep -v Run
kubectl get pods -o wide -n <Pending pod namespace>
kubectl describe pod <Pending pod name> -n <Pending pod namespace>
X node(s) didn't match pod affinity/selector.
While connected to the Supervisor cluster context, one or more of following symptoms are present for the affected vSphere Kubernetes cluster:
kubectl get machines -n <cluster namespace>
kubectl describe tkc <cluster name> -n <cluster namespace>
Message: 0/3 Control Plane Node(s) healthy
kubectl get kcp -n <cluster namespace>
kubectl describe kcp <kcp name> -n <cluster namespace>
Message: Following machines are reporting control plane errors: <my-control-plane-abcde>
While connected to the vSphere Kubernetes cluster context, the following symptoms are present:
For TKRs v1.20, v1.21, v1.22 and v1.23, one or more cluster control plane nodes do not have both "control-plane,master" roles or shows <none> for ROLES:
kubectl get nodes
NAME STATUS ROLES
<guest-cluster-control-plane-vm-a> Ready control-plane
<guest-cluster-control-plane-vm-b> Ready master
<guest-cluster-control-plane-vm-c> Ready <none>
For TKRs v1.24 and higher, one or more cluster control plane nodes do not have the "control-plane" role, showing <none> for ROLES:
kubectl get nodes
NAME STATUS ROLES
<guest-cluster-control-plane-vm-a> Ready <none>
<guest-cluster-control-plane-vm-a> Ready <none>
<guest-cluster-control-plane-vm-a> Ready <none>
kubectl get pods -A | grep -v Run
kubectl get pods -o wide -n <Pending pod namespace>
kubectl describe pod <Pending pod name> -n <Pending pod namespace>
X node(s) didn't match pod affinity/selector.
vSphere 7.0 with Tanzu
vSphere 8.0 with Tanzu
This issue can occur on a vSphere Kubernetes cluster or Supervisor Cluster regardless of whether or not it is managed by Tanzu Mission Control (TMC)
An issue occurred when creating the affected control planes where the roles were not assigned by kubeadm.
Many vSphere Kubernetes and kube-system pods have tolerations which are reliant on an available, Ready control plane with expected roles as described in the Issue/Introduction above.
Multiple vSphere Kubernetes and kube-system pods are configured to have multiple replicas where one replica is expected to run per control plane.
These system pods will remain in Pending state until the proper roles are assigned to the control plane(s) which are missing the expected role(s).
This issue can also occur when a control plane is manually deleted but the system did not finish deleting the node, allowing a restart of kubelet to recreate the control plane node.
IMPORTANT: It is not a proper troubleshooting step to delete nodes. Deleting nodes will often worsen the issue. In particular, deleting control plane nodes can leave a cluster broken, unmanageable and unrecoverable.
Please reach out to VMware by Broadcom Technical Support referencing this KB article for assistance on correcting the missing role(s) on the control plane(s).