When vmconfig-operator is running in the management cluster and managing node customization for all the nodes in the workload clusters and at the time when vmconfig-operator does a reconcile it will get the machine status for all in memory causing golang to take a while to do a memory recycle. If the amount of memory used by a pod exceeds the allowed memory then Kubernetes will terminate the pod. In turn, Telco Cloud Automation fails to apply the nodepolicy to the management cluster and no network function can be instantiated.
Resolution
A fix for this will be released with Telco Cloud Automation 2.1
Workaround: 1. SSH into the TCA-CP appliance and switch to root user. 2. Enlarge the memory for the vmconfig-operator pod by running the following command: curl -kfsSL 'https://vmwaresaas.jfrog.io/artifactory/generic-registry/kb/20220524/enlarge_vmconfig_mem.sh' | bash 3. The script output will print the information to indicate if a management cluster is updated properly. An example looks like below for management cluster mc7:
current cluter is cluster: mc7 deployment.apps/vmconfig-operator patched NAMESPACE NAME MemoryLimit tca-system vmconfig-operator 2Gi vmconfig-operator pod is not Running, will recheck after 3 sec vmconfig-operator pod is not Running, will recheck after 3 sec vmconfig-operator pod is not Running, will recheck after 3 sec vmconfig-operator pod is Running NAME READY STATUS RESTARTS AGE vmconfig-operator-849dddc645-24qr7 1/1 Running 0 13s Enlarge vmconfig-operator memory finished