TKGm - legacy clusters - updating CPU and Memory for a control plane node due to high CPU utilization and relatively high etcd db size.
search cancel

TKGm - legacy clusters - updating CPU and Memory for a control plane node due to high CPU utilization and relatively high etcd db size.

book

Article ID: 384616

calendar_today

Updated On:

Products

Tanzu Kubernetes Runtime VMware Tanzu Kubernetes Grid Management Tanzu Kubernetes Grid VMware Tanzu Kubernetes Grid Plus VMware Tanzu Kubernetes Grid Plus 1.x

Issue/Introduction

This article is to provide a guide to updating the CPU and Memory values for control plane nodes in a legacy cluster.
This change would be required in an environment where control plane nodes CPU and Memory utilization are too high and control plane node crashes are occurring.

Environment

TKGm

Cause

In this scenario there was a single crashing Control Plane node.
The control plane node was seen crashing with 100% CPU 0% Memory and 0% network and with no IP assigned.
In it's corresponding control plane nodes it was noted that their CPU utilization was oscillating from low to high consistently and the etcd database was relatively high between 100-500MB bracket. 

Resolution


The ideas of the below procedure is to create a new vSphereMachineTemplate with the required CPU and Memory requirements and then to update the KCP with the newly created vSphereMachineTemplate with amened CPU and Memory requirements.

Note: this will recreate the control plane nodes of the legacy cluster you are making the change on.

Connect to the management cluster context and start by checking the KCP for the cluster's vSphereMachineTemplate name and namespace.
This can be done with the below command:

kubectl get kcp <cluster-name>-control-plane -oyaml | grep -A 2 -Ei "kind: VSphereMachineTemplate"

The output of above will look similar to below:

kind: VSphereMachineTemplate
      name: <cluster-name>-control-plane
      namespace: default


Next check the vSphereMachineTemplate using the name from the above output with the below command:

kubectl get VSphereMachineTemplate <cluster-name>-control-plane -oyaml | less

 

From the output yaml you will see the name, CPU and Memory that needs to be changed

kind: VSphereMachineTemplate
       metadata:
           vmTemplateMoid: vm-xxxx
         creationTimestamp: "2024-12-17T11:40:02Z"
         generation: 1
         name: <cluster-name>-control-plane
       ...
       memoryMiB: xxxx
       ...
       numCPUs: x

Note - in your environment memoryMiB may be 8G and numCPUs may be 2


To scale up or vertically simply follow below instructions:

Create a VSphereMachineTemplate yaml for the desired CPU and Memory changes

kubectl get VSphereMachineTemplate <cluster-name>-control-plane -oyaml > <cluster-name>-control-plane-cpu8-mem16.yaml


Next vi into this yaml file and make the required changes:

vi <cluster-name>-control-plane-cpu8-mem16.yaml


After vi to the yaml file, update the vSphereMachineTemplate name to - <cluster-name>-control-plane-cpu8-mem16 , and numCPUs to 8, and memoryMiB to 16
Also remove the annotation kubectl.kubernetes.io/last-applied-configuration: and line beneath it as per below for example.

annotations:
     kubectl.kubernetes.io/last-applied-configuration: |
       {"apiVersion":"infrastructure.cluster.x-k8s.io/v1beta1","kind":"VSphereMachineTemplate","metadata":{"annotations":{"vmTemplateMoid":"vm-1268"},"name":"<cluster-name>-control-plane","namespace":"default"},"spec":{"template":{"spec":{"cloneMode":"fullClone","datacenter":"/Datacenter","datastore":"/Datacenter/datastore/vsanDatastore","diskGiB":60,"folder":"/Datacenter/vm/env11","memoryMiB":4096,"network":{"devices":[{"dhcp4":true,"networkName":"/Datacenter/network/vsanSwitch-xxxxxxx"}]},"numCPUs":4,"resourcePool":"/Datacenter/host/Cluster/Resources/xxxx","server":"xxx-xxxx-xxxx.xxxx.xxxxx.xxxxx","storagePolicyName":"","template":"/Datacenter/vm/tkg/photon-5-kube-v1-28-11+vmware-2-tkg-2-bc1be57677254736xxxxxxxxxxxxxxxx"}}}}

 

Next save the vSphereMachineTemplate yaml by typing ':wq!' - without the quote '', when in vi mode - this will write changes and close the file.

Now apply the created vSphereMachineTemplate yaml as below:

kubectl apply -f workload-slot1rp11-2-control-plane-cpu8-mem16.yaml

You will see the output below thereafter:

vspheremachinetemplate.infrastructure.cluster.x-k8s.io/<cluster-name>-control-plane-cpu8-mem16 created


Check the VSphereMachineTemplate's creation using below command:

kubectl get VSphereMachineTemplate

You will see the below output:

 NAME                                      AGE
 <cluster-name>-control-plane              32m
 <cluster-name>-control-plane-cpu8-mem16   12s
 <cluster-name>-worker                     32m


Confirm the creation has the changes made with below command:

kubectl get VSphereMachineTemplate <cluster-name>-control-plane-cpu8-mem16 -oyaml | less


Next is to copy the KCP to a yaml and vi to the yaml to edit:

kubectl get kcp <cluster-name>-control-plane -oyaml > kcp-<cluster-name>-control-plane-cpu8-mem16.yaml


Vi into the KCP yaml to make the required changed: 

 vi kcp-<cluster-name>-control-plane-cpu8-mem16.yaml


Update the name of the VSphereMachineTemplate in the KCP to <cluster-name>-control-plane-cpu8-mem16.

Then save the changes by typing ':wq!' - without the quote '', when in vi mode - this will write changes and close the file.


Finally apply this new KCP to the cluster.

kubectl apply -f kcp-workload-slot1rp11-2-control-plane-cpu8-mem16.yaml


The output will look similar to below:

kubeadmcontrolplane.controlplane.cluster.x-k8s.io/workload-slot1rp11-2-control-plane configured


Check the KCP has the correct vSphereMachineTemplate name:

kubectl get kcp <cluster-name>-control-plane -oyaml | grep -A 2 -Ei "kind: VSphereMachineTemplate"

The output will look as per below:

kind: VSphereMachineTemplate
       name: <cluster-name>-control-plane-cpu8-mem16
       namespace: default


Switch to the cluster's context and confirm the Control Plane has been recreated recently:

kubectl get nodes

You will see an output similar to below showing the control plane's age with a recent timestamp.

NAME                                       STATUS   ROLES           AGE   VERSION
 <cluster-name>-control-plane-ghh2n   Ready    control-plane   15s   v1.28.11+vmware.2
 <cluster-name>-md-0-rfmtz-lzmtd      Ready    <none>          46m   v1.28.11+vmware.2


Confirm the Control Plane has been updated with the required resources by running below command:

kubectl get nodes workload-slot1rp11-2-control-plane-ghh2n -oyaml | grep -A 15 -Ei "allocatable"


Where the output will have a section similar to below:

  allocatable:
    cpu: "8"
    ephemeral-storage: "56888314582"
    hugepages-1Gi: "0"
    hugepages-2Mi: "0"
    memory: 16269092Ki
    pods: "110"
  capacity:
    cpu: "8"
    ephemeral-storage: 61727772Ki
    hugepages-1Gi: "0"
    hugepages-2Mi: "0"
    memory: 16371492Ki
    pods: "110"

Note - cpu: is now 8 and memory: is now 16269092Ki or 16G

The scale up/vertical scaling of the control plane node(s) in your legacy cluster has completed.