vSphere Kubernetes Cluster Control Plane or vSphere Kubernetes Supervisor Cluster Control Plane Missing ROLES
search cancel

vSphere Kubernetes Cluster Control Plane or vSphere Kubernetes Supervisor Cluster Control Plane Missing ROLES

book

Article ID: 386786

calendar_today

Updated On:

Products

VMware vSphere 7.0 with Tanzu vSphere with Tanzu Tanzu Kubernetes Runtime

Issue/Introduction

In a vSphere Kubernetes Cluster or Supervisor cluster, one or more control planes are missing the expected roles.

Please confirm on which context and cluster is affected and check the appropriate section's steps:

  • Supervisor Cluster Control Plane for Supervisor cluster control plane vms
  • vSphere Kubernetes Cluster Control Plane for guest cluster control plane nodes

 

Supervisor Cluster Control Plane

While connected to the Supervisor cluster context, the following symptoms are present:

  • One or more Supervisor cluster control plane vm(s) do not have both "control-plane,master" roles or shows <none> for ROLES:
    • kubectl get nodes

      NAME STATUS ROLES
      <supervisor-vm-dns-name-1> READY control-plane
      <supervisor-vm-dns-name-2> READY <none>
      <supervisor-vm-dns-name-3> READY master
      • If one or both of the above roles are missing, this will cause issues in the environment regarding pod scheduling and cluster management.
      • Note: ESXi hosts are expected to only have the agent role.

  • One or more system pods are stuck in Pending state:
    • kubectl get pods -A | grep -v Run

  • There are 2 or fewer replicas of the same Pending pod in Running state on the other, healthy Supervisor control plane VMs:
    • kubectl get pods -o wide -n <Pending pod namespace>

  • Describing the pods stuck in Pending state return an error message similar to the following, where the toleration is searching for a Ready control plane node with the expected role:
    • kubectl describe pod <Pending pod name> -n <Pending pod namespace>
    • X node(s) didn't match pod affinity/selector.

 

vSphere Kubernetes Cluster Control Plane

While connected to the Supervisor cluster context, one or more of following symptoms are present for the affected vSphere Kubernetes cluster:

  • All control plane nodes for the affected cluster show Running state:
    • kubectl get machines -n <cluster namespace>

  • A describe of the affected cluster's TKC shows that not all control plane nodes are healthy:
    • kubectl describe tkc <cluster name> -n <cluster namespace>

      Message: 0/3 Control Plane Node(s) healthy
  • The kubeadmcontrolplane (kcp) for the affected cluster shows that not all control planes are Available:
    • kubectl get kcp -n <cluster namespace>

  • Describing the kubeadmcontrolplane (kcp) for the affected cluster shows an error message similar to the following, where the name of the control plane(s) will vary by environment:
    • kubectl describe kcp <kcp name> -n <cluster namespace>
    • Message:    Following machines are reporting control plane errors: <my-control-plane-abcde>

 

While connected to the vSphere Kubernetes cluster context, the following symptoms are present:

  • For TKRs v1.20, v1.21, v1.22 and v1.23, one or more cluster control plane nodes do not have both "control-plane,master" roles or shows <none> for ROLES:

    • kubectl get nodes

      NAME STATUS ROLES
      <guest-cluster-control-plane-vm-a> Ready control-plane
      <guest-cluster-control-plane-vm-b> Ready master
      <guest-cluster-control-plane-vm-c> Ready <none>

  • For TKRs v1.24 and higher, one or more cluster control plane nodes do not have the "control-plane" role, showing <none> for ROLES:

    • kubectl get nodes

      NAME STATUS ROLES
      <guest-cluster-control-plane-vm-a> Ready <none>
      <guest-cluster-control-plane-vm-a>    Ready <none>
      <guest-cluster-control-plane-vm-a>    Ready <none>

  • If one or both of the above roles are missing, this will cause issues in the environment regarding pod scheduling and cluster management.
    • Note: Worker nodes are not expected to have roles and will show as <none>

  • One or more system pods are stuck in Pending state:
    • kubectl get pods -A | grep -v Run

  • There are 2 or fewer replicas of the same Pending pod in Running state on the other, healthy control plane nodes:
    • kubectl get pods -o wide -n <Pending pod namespace>

  • Describing the pods stuck in Pending state return an error message similar to the following, where the toleration is searching for a Ready control plane node with the expected role:
    • kubectl describe pod <Pending pod name> -n <Pending pod namespace>
    • X node(s) didn't match pod affinity/selector.

Environment

vSphere 7.0 with Tanzu
 
vSphere 8.0 with Tanzu
 
This issue can occur on a vSphere Kubernetes cluster or Supervisor Cluster regardless of whether or not it is managed by Tanzu Mission Control (TMC)

Cause

An issue occurred when creating the affected control planes where the roles were not assigned by kubeadm.

Many vSphere Kubernetes and kube-system pods have tolerations which are reliant on an available, Ready control plane with expected roles as described in the Issue/Introduction above.

Multiple vSphere Kubernetes and kube-system pods are configured to have multiple replicas where one replica is expected to run per control plane.

These system pods will remain in Pending state until the proper roles are assigned to the control plane(s) which are missing the expected role(s).

 

This issue can also occur when a control plane is manually deleted but the system did not finish deleting the node, allowing a restart of kubelet to recreate the control plane node.

IMPORTANT: It is not a proper troubleshooting step to delete nodes. Deleting nodes will often worsen the issue. In particular, deleting control plane nodes can leave a cluster broken, unmanageable and unrecoverable.

Resolution

Please reach out to VMware by Broadcom Technical Support referencing this KB article for assistance on correcting the missing role(s) on the control plane(s).