Duplicate KCP and vspherecluster object gets created in a TKGS cluster.
search cancel

Duplicate KCP and vspherecluster object gets created in a TKGS cluster.

book

Article ID: 391909

calendar_today

Updated On:

Products

VMware vSphere Kubernetes Service

Issue/Introduction

  • Duplicate 'kubeadmcontrolplane' and 'VSphereCluster' object gets created if the controlPlaneRef and infrastructureRef specs are removed from the cluster. Any further edits to the cluster will not take place as nodes will fail to get reconciled. 

  • If we check the number of kubeadmcontrolplane and VSphereCluster objects in the cluster, duplicate entries would be observed : 

    kubectl get kcp -n <namespace_name>
    
    NAMESPACE                     NAME                     CLUSTER            INITIALIZED   API SERVER AVAILABLE   REPLICAS   READY   UPDATED   UNAVAILABLE   AGE   VERSION
    <namespace_name>   cluster123-KCP1  cluster123        true                        true                          3                 -3            2d   v1.30.1+vmware.1-fips
    <namespace_name>   cluster123-KCP2  cluster123        true                         true                         3          3       3         0             45d   v1.30.1+vmware.1-fips
    kubectl get vspherecluster -n <Namespace_name>
    
    NAMESPACE                        NAME                     AGE
    <Namespace_name>  cluster123-vspherecluster2   45d
    <Namespace_name>  cluster123-vspherecluster1    2d


    Where cluster123-KCP1 is the newer KCP object and cluster123-vspherecluster1 is the newer vspherecluster object which have got created in the Cluster.  

     

    capi-kubeadm-control-plane-controller-manager pod will have the following log trace 

    EMMDD HH:MM:SS       1 controller.go:302] "KCP cannot reconcile" err="not all control plane machines are owned by this KubeadmControlPlane, refusing to operate in mixed management mode" controller="kubeadmcontrolplane" controllerGroup="controlplane.cluster.x-k8s.io" controllerKind="KubeadmControlPlane" KubeadmControlPlane="<Namespace_Name>/<KCP_Object_1>" namespace="<Namespace_Name>" name="<KCP_Object1>" reconcileID="j09axxxx-xxxx-4xxx-b7xxxxxxxxxxn89" Cluster="<Namespace_Name>/<Cluster_Name>"
    IMMDD HH:MM:SS       1 controller.go:346] "Reconcile KubeadmControlPlane" controller="kubeadmcontrolplane" controllerGroup="controlplane.cluster.x-k8s.io" controllerKind="KubeadmControlPlane" KubeadmControlPlane="<Namespace_Name>/<KCP_Object_2>" namespace="<Namespace_Name>" name="<KCP_Object2>" reconcileID="k17axxxx-xxxx-3xxx-l3xxxxxxxxxxn67" Cluster="<Namespace_Name>/<Cluster_Name>"



  • Control Plane VM's of the Cluster still belong to the older KCP object only 

    kubectl get vm -n <Namespace_name>


    virtualmachine.vmoperator.vmware.com/cluster123-KCP2-abcd1                         PoweredOn     guaranteed-xlarge              vmi-54xxxxxxxxxxxxx7   <IP_Address_CP_Node>   45d
    virtualmachine.vmoperator.vmware.com/cluster123-KCP2-abcd2                         PoweredOn     guaranteed-xlarge              vmi-54xxxxxxxxxxxxx7     <IP_Address_CP_Node>  45d
    virtualmachine.vmoperator.vmware.com/cluster123-KCP2-abcd3                         PoweredOn     guaranteed-xlarge              vmi-54xxxxxxxxxxxxx7     <IP_Address_CP_Node>  45d
  • controlPlaneRef and infrastructureRef in the cluster refer to the newer KCP and vspherecluster object. This can be verified by 

    • kubectl get cluster <cluster_name> -n <namespace> -o yaml | grep -Ei -A4 'controlPlaneRef | infrastructureRef' 
        controlPlaneRef:
          apiVersion: controlplane.cluster.x-k8s.io/v1beta1
          kind: KubeadmControlPlane
          name: cluster123-KCP1
          namespace: <namespace_name>
        infrastructureRef:
          apiVersion: vmware.infrastructure.cluster.x-k8s.io/v1beta1
          kind: VSphereCluster
          name: cluster123-vspherecluster1
          namespace: <namespace_name>



Environment

VMware vSphere with Tanzu 

Cause

This problem could be caused if the controlPlaneRef and infrastructureRef specs from the Cluster are removed manually or the cluster is updated by modifying the local YAML file via which the cluster was created originally. Local YAML file does not contain the controlPlaneRef and infrastructureRef specs and re-applying it again can lead to this issue. 

Resolution

  • Open an SSH session to one of the Supervisor Control Plane node. Please refer to the following  KB on how to SSH into Supervisor Control Plane VM. 
  • Ensure that two kubeadmcontrolplane' and 'VSphereCluster' object have been created and the Control Plane nodes still belong to the older KCP object.
    • This can be verified by getting the list of the KCP object and the list of the Control Plane VM and and ensuring that the CP nodes still belong to the older KCP object. 

 

  • Take a backup of the Guest Cluster and newer KCP, vspherecluster object's YAML files
    • kubectl get cluster <cluster_name> -n <namespace> -o yaml > /root/TKC.yaml 

    • kubectl get kcp cluster123-KCP1 -n <namespace> -o yaml > /root/KCP.yaml 

    • kubectl get vspherecluster cluster123-vspherecluster1 -n <namespace> -o yaml > /root/vspherecluster.yaml 

  • Edit the cluster to update the values of  .spec.controlPlaneRef to point out to older KCP object and the value of  .spec.infrastructureRef to the older vspherecluster object. Here in this case older KCP object refers to 'cluster123-KCP2' and older vspherecluster object refers to 'cluster123-vspherecluster2'. 
    • kubectl edit cluster <cluster_name> -n <namespace> 

    • NOTE : Kindly make sure to edit the values as per the environment. 


  • Once done, delete the newer KCP and vspherecluster objects deployed for the affected cluster from the Supervisor Cluster 

    • kubectl delete kcp cluster123-KCP1 -n <namespace_name> 

    • kubectl delete vspherecluster cluster123-vspherecluster1 -n <namespace_name>