In Telco Cloud Automation (TCA) 2.0.X and 1.9.5, the task to instantiate a network function fails with Internal error occurred: failed calling webhook “defaulter.vmconfig.acm.vmware.com”: Post “https://vmconfig-webhook-service.tca-system.svc:443/mutate-acm-vmware-com-v1alpha1-nodepolicy?timeout=5s”: dial tcp #.#.#.#:443: connect: connection refused
vmconfig-operator
is in an out of memory state:kubectl describe pod -n tca-system -l "control-plane=vmconfig-operator"
Output will show vmconfig-operator pod
LastState: Terminated
Reason: 00MKilled
Example:
1.9.5, 2.0.x
When vmconfig-operator
is running in the management cluster and managing node customization for all the nodes in the workload clusters and at the time when vmconfig-operator
does a reconcile it will get the machine status for all in memory causing golang to take a while to do a memory recycle.
Kubernetes terminates a pod ff the amount of memory used by a pod exceeds the allowed memory.
In turn, Telco Cloud Automation fails to apply the nodepolicy to the management cluster and no network function can be instantiated.
Resolved in TCA 2.1
./enlarge_vmconfig_mem.sh
Example:current cluster is cluster: <clusterName>
deployment.apps/vmconfig-operator patched
NAMESPACE NAME MemoryLimit
tca-system vmconfig-operator 2Gi
vmconfig-operator pod is not Running, will recheck after 3 sec
vmconfig-operator pod is not Running, will recheck after 3 sec
vmconfig-operator pod is not Running, will recheck after 3 sec
vmconfig-operator pod is Running
NAME READY STATUS RESTARTS AGE
vmconfig-operator-849dddc645-24qr7 1/1 Running 0 13s
Enlarge vmconfig-operator memory finished