How to Increase Multus-cni DaemonSet resource limits when Pods creation is failing intermittently
search cancel

How to Increase Multus-cni DaemonSet resource limits when Pods creation is failing intermittently

book

Article ID: 325399

calendar_today

Updated On:

Products

VMware VMware Telco Cloud Automation

Issue/Introduction

This document contains the procedure to update the multus resource limit. In this procedure we are increasing memory request/limit on the multus-cni Daemonset container "kube-multus"This document contains the procedure to update the multus resource limit. In this procedure we are increasing memory request/limit on the multus-cni Daemonset container "kube-multus".

Symptoms:
  • TCA version 3.0. Pods creation is failing intermittently with Cluster version 1.26.8 K8S TKG 2.3.1.
  • Issue is not observed when creating the cluster with only the Calico Add-on. It is only observed when adding Multus Add-on to the cluster
Error  : Feb 29 16:58:00 xyz-test-vmw-1-np2-cork-4rpkg-75c77cb44fxpdb88-7nhzp kubelet[1395]: E0229 16:58:00.068830 1395 remote_runtime.go:205] "StopPodSandbox from runtime service failed" err="rpc error: code = Unknown desc = failed to destroy network for sandbox \"xy6041930e87e6f22f96f07809c178f614d6e10bc17f70dfa1c144daee8855d4\": plugin type=\"multus-shim\" name=\"multus-cni-network\" failed (delete): CmdDel (shim): failed to send CNI request: Post \"http://dummy/cni\": EOF" podSandboxID="xy6041930e87e6f22f96f07809c178f614d6e10bc17f70dfa1c144daee8855d4"

Environment

VMware Telco Cloud Automation 3.1
VMware Telco Cloud Automation 3.0

Cause

Users have experienced OOM-killed calico-ipam processes when using multus+calico in certain clusters (likely higher scale). This causes intermittent issues when creating containers. The calico-ipam plugin was being OOM-killed in the multus-cni DaemonSet pod because the 50Mi memory limit was too low.
the limit hit in the log: "memory: usage 51200kB, limit 51200kB". So it requires to increase the memory request/limit on multus.

Resolution

Increase the memory request/limit on multus via the TCA UI

1. Log into the TCA Web UI.

2. Go to Infrastructure > CaaS Infrastructure.

3. Click target workload cluster from the Cluster list.

4. Click Add-ons.

5. Click three-dots before the multus addon and click Edit.

6. Click the SAVE button on the Add-on Configuration dialog.

7. Click the NEXT button.

8. Click Custom Resources (CR) on the top.

9. Edit yaml file on the right-hand pane as shown here:

Screenshot 2024-04-15 at 16.02.04.png

10. Click DEPLOY CHANGES at the bottom.

11. Wait for the addon status to change to a Provisioned state.

 

Verification:

1. Login to the TCA-CP where the Management cluster is deployed as admin user.

2. Run the below command as root to ssh to workload cluster

su -

ssh capv@<workload cluster endpoint IP>

3 Check if multus pods have the new resources.

kubectl get pod -n kube-system -l name=multus -o jsonpath="{range .items[*]}{.spec.containers[*].resources}{'\n’}"

Increase the memory request/limit on multus via the TCA-CP command line:

1. Login to the TCA-CP where the Management cluster is deployed as admin user.

2. Run the below command as root to ssh to workload cluster

su -

ssh capv@<workload cluster endpoint IP>

3. Get the current multus values.yaml

kubectl -n tca-system get secret multus-tca-addon-secret -o "jsonpath={@.data.values\.yaml}"|base64 -d > multus.yaml

4. Add resources to multus.yaml

cat <<EOF>> multus.yaml

  resources:

    limits:

      cpu: 300m

      memory: 150Mi

    requests:

      cpu: 200m

      memory: 100Mi

EOF

5. Apply the new values.yml to multus secret.

VALUES_YAML=`base64 -w0 multus.yaml` 

kubectl patch secret -n tca-system multus-tca-addon-secret --patch '{"data":{"values.yaml":"'$VALUES_YAML'"}}' 

6 Check if multus pods have the new resources.

kubectl get pod -n kube-system -l name=multus -o jsonpath="{range .items[*]}{.spec.containers[*].resources}{'\n'}"


Additional Information

Impact/Risks:
This issue was observed in Telco Cloud Automation 3.0. Issue is only applicable to Multus 4.0.1 or above and TKG 2.3.1 or above. This issue may occur on TCA 3.1.