How to increase the memory in the capi-controller-manager pod when it is crashing due to insufficient resources at scale.
search cancel

How to increase the memory in the capi-controller-manager pod when it is crashing due to insufficient resources at scale.

book

Article ID: 312049

calendar_today

Updated On:

Products

VMware vCenter Server

Issue/Introduction

This article assists users who are having trouble realizing their large number of TKG clusters since the pod crashed due to inadequate memory being allotted to the pod.


Symptoms:

If TKG clusters are taking a long time to be realized at scale (Example: Hundreds of TKG clusters), then it could be due to the capi-controller-manager pod crashing due to insufficient memory. To determine if this is the case, please follow these steps:

1. SSH into the vCenter appliance:
       ssh root@<VCSA_IP>

2. Print the credentials used to login to the Supervisor control plane:
       /usr/lib/vmware-wcp/decryptK8Pwd.py

3. SSH into the Supervisor control plane using the IP and credentials from
the previous step:

         ssh root@<SUPERVISOR_IP>


4. Check to see if capi-controller-manager pod has crashed due to an Out
of Memory (OOM) error:
    kubectl -n vmware-system-capw \
    describe pods -l name=capi-controller-manager | \
    grep -F OOMKilled


If OOMKilled is in the output from the above command, then the pod was terminated due to lack of sufficient memory.


Environment

VMware vCenter Server 8.0.1
VMware vCenter Server 8.0.x

Cause

The capi-controller-manager pod is allocated 1200Mi (MB), but because of recent adjustments and an organic rise in the number of resources being monitored by the controller, more memory is being used during active reconciliation. The 1200Mi hard limit is exceeded by this "burst" requirement.

Resolution

Currently there is no resolution.


Workaround:
The memory limit of the capi-controller-manager pod can be increased. The following steps describe how to increase the limit to 1600Mi (MB):

1. SSH into the vCenter appliance:
ssh root@<VCSA_IP>

2. Print the credentials used to login to the Supervisor control plane:
/usr/lib/vmware-wcp/decryptK8Pwd.py

3. SSH into the Supervisor control plane using the IP and credentials from the previous step:
 ssh root@<SUPERVISOR_IP>

4. Increase the memory limit to 1600MB for the capi-controller-manager pod using the following command:
kubectl -n vmware-system-capw \
patch deployments capi-controller-manager \
-p '{"spec":{"template":{"spec":{"containers":[{"name":"manager","resources":{"limits":{"memory":"1600Mi"}}}]}}}}'

 
Now, everything should to function properly.