How to increase the memory in the kubeadm-control-plane pod when it's crashing due to insufficient resources at scale

search cancel

How to increase the memory in the kubeadm-control-plane pod when it's crashing due to insufficient resources at scale

book

Article ID: 323439

calendar_today

Updated On:

Products

VMware vCenter Server

Issue/Introduction

This is meant to be shared with customers running a large number of TKG clusters who are experiencing issues realizing those clusters due to the pod crashing due to insufficient memory allocated to the pod.

Symptoms:
If TKG clusters are taking a long time to be realized when at scale (ex. hundreds of TKG clusters), then it could be due to the kubeadm-control-plane pod crashing due to insufficient memory.
To determine if this is the case, please follow these steps:

1. SSH into the vCenter appliance:

ssh root@<VCSA_IP>

2. Follow KB 90194 to ssh into the Supervisor Control Plane VM as root.

3. Check to see if kubeadm-control-plane pod has crashed due to an Out of Memory (OOM) error:

kubectl -n vmware-system-capw \
describe pods -l control-plane=controller-manager | \
grep -F OOMKilled

If OOMKilled is in the output from the above command, then the pod was terminated due to lack of sufficient memory.

PLEASE NOTE: When on the supervisor control plane VM you have permissions to permanently damage the cluster. If VMware Support finds evidence of a customer making changes to the supervisor cluster from the SV VM, they may mark your cluster as unsupported and require you redeploy the entire vSphere with Tanzu solution. Only use this session to test networks, look at logs, and run kubectl logs/get/describe commands. Do not deploy, delete, or edit anything from this session without the express permissions from VMware Support or specific instructions about what exactly you need to deploy/delete/edit from a kb.

Environment

VMware vCenter Server 7.0.3

Cause

The kubeadm-control-plane pod is allocated 800Mi (MB), but due to recent changes and organic growth in number of resources being watched by the controller, the amount of memory consumed during active reconciliation has increased. This "burst" requirement exceeds the hard limit of 800Mi.

Resolution

No resolution available.

Workaround:
The memory limit of the kubeadm-control-plane pod can be increased. The following steps describe how to increase the limit to 1200Mi (MB):

1. SSH into the vCenter appliance:

ssh root@<VCSA_IP>

2. Follow KB 90194 to ssh into the Supervisor Control Plane VM as root.

PLEASE NOTE: When on the supervisor control plane VM you have permissions to permanently damage the cluster. If VMware Support finds evidence of a customer making changes to the supervisor cluster from the SV VM, they may mark your cluster as unsupported and require you redeploy the entire vSphere with Tanzu solution. Only use this session to test networks, look at logs, and run kubectl logs/get/describe commands. Do not deploy, delete, or edit anything from this session without the express permissions from VMware Support or specific instructions about what exactly you need to deploy/delete/edit from a kb.

3. Go ahead and scale down the kubeadm-control-plane deployment so no pods are running:

kubectl -n vmware-system-capw \
scale deployment capi-kubeadm-control-plane-controller-manager \
--replicas 0

4. Increase the memory limit to 1200MB for the kubeadm-control-plane pod using the following command:

kubectl -n vmware-system-capw \
patch limitrange capw-default-limit-range \
-p '{"spec":{"limits":[{"type":"Container","default":{"memory":"1200Mi"}}]}}'

5. Scale the replicas back to their original value:

kubectl -n vmware-system-capw \
scale deployment capi-kubeadm-control-plane-controller-manager \
--replicas 3

Feedback

thumb_up Yes

thumb_down No