vmconfig-operator pod keeps crashing due to out of memory
search cancel

vmconfig-operator pod keeps crashing due to out of memory

book

Article ID: 342397

calendar_today

Updated On:

Products

VMware Telco Cloud Automation

Issue/Introduction

In Telco Cloud Automation (TCA) 2.0.X and 1.9.5, the task to instantiate a network function fails with Internal error occurred:
failed calling webhook “defaulter.vmconfig.acm.vmware.com”: Post “https://vmconfig-webhook-service.tca-system.svc:443/mutate-acm-vmware-com-v1alpha1-nodepolicy?timeout=5s”: dial tcp #.#.#.#:443: connect: connection refused 

Confirm vmconfig-operator is in an out of memory state:

  • SSH into master node of the management cluster
  • Run the following command:
    kubectl describe pod -n tca-system -l "control-plane=vmconfig-operator"

Output will show vmconfig-operator pod

LastState: Terminated
Reason: 00MKilled

Example:


Environment

1.9.5, 2.0.x

Cause

When vmconfig-operator is running in the management cluster and managing node customization for all the nodes in the workload clusters and at the time when vmconfig-operator does a reconcile it will get the machine status for all in memory causing golang to take a while to do a memory recycle. 

Kubernetes terminates a pod ff the amount of memory used by a pod exceeds the allowed memory.

In turn, Telco Cloud Automation fails to apply the nodepolicy to the management cluster and no network function can be instantiated. 

Resolution

Resolved in TCA 2.1

Workaround: Enlarge the memory limit for the vmconfig-operator pod.

  1. SSH into the TCA-CP appliance and switch to root user.
  2. Download the attached enlarge_vmconfig_mem.sh script.
    ./enlarge_vmconfig_mem.sh
  3. The script output will print the information to indicate if a management cluster is updated properly.

Example:
current cluster is     cluster: <clusterName>
deployment.apps/vmconfig-operator patched

NAMESPACE    NAME                MemoryLimit
tca-system   vmconfig-operator   2Gi
vmconfig-operator pod is not Running, will recheck after 3 sec
vmconfig-operator pod is not Running, will recheck after 3 sec
vmconfig-operator pod is not Running, will recheck after 3 sec
vmconfig-operator pod is Running

NAME                                 READY   STATUS    RESTARTS   AGE
vmconfig-operator-849dddc645-24qr7   1/1     Running   0          13s
 
Enlarge vmconfig-operator memory finished

Attachments

enlarge_vmconfig_mem.sh get_app