Error : Feb 29 16:58:00 xyz-test-############## kubelet[1395]: E0229 16:58:00.068830 1395 remote_runtime.go:205] "StopPodSandbox from runtime service failed" err="rpc error: code = Unknown desc = failed to destroy network for sandbox \"#########################################\": plugin type=\"multus-shim\" name=\"multus-cni-network\" failed (delete): CmdDel (shim): failed to send CNI request: Post \"http://example/cni\": EOF" podSandboxID="#########################################"
Users have experienced OOM-killed calico-ipam processes when using multus+calico in certain clusters (likely higher scale). This causes intermittent issues when creating containers. The calico-ipam plugin was being OOM-killed in the multus-cni DaemonSet pod because the 50Mi memory limit was too low.
the limit hit in the log: "memory: usage 51200kB, limit 51200kB". So it requires to increase the memory request/limit on multus.
Increase the memory request/limit on multus via the TCA UI
Verification:
su -
ssh capv@<workload cluster endpoint IP>
kubectl get pod -n kube-system -l name=multus -o jsonpath="{range .items[*]}{.spec.containers[*].resources}{'\n’}"
Increase the memory request/limit on multus via the TCA-CP command line:
Note: The change via command line will be overwritten by update on TCA UI. So you need to edit Multus Addon on UI after upgrade as soon as possible.
su -
ssh capv@<management cluster endpoint IP>
kubectl -n <workload cluster name> get secret multus-tca-addon-secret -o "jsonpath={@.data.values\.yaml}"|base64 -d > multus.yaml
cat <<EOF>> multus.yaml
resources:
limits:
cpu: 300m
memory: 150Mi
requests:
cpu: 200m
memory: 100Mi
EOF
VALUES_YAML=`base64 -w0 multus.yaml`
kubectl patch secret -n <workload cluster name> multus-tca-addon-secret --patch '{"data":{"values.yaml":"'$VALUES_YAML'"}}'
exit
ssh capv@<workload cluster endpoint IP>
kubectl get pod -n kube-system -l name=multus -o jsonpath="{range .items[*]}{.spec.containers[*].resources}{'\n'}"