In Telco Cloud Automation (TCA) 2.0.X and 1.9.5, the task to instantiate a network function fails with Internal error occurred: failed calling webhook “defaulter.vmconfig.acm.vmware.com”: Post “https://vmconfig-webhook-service.tca-system.svc:443/mutate-acm-vmware-com-v1alpha1-nodepolicy?timeout=5s”: dial tcp #.#.#.#:443: connect: connection refused
vmconfig-operator is in an out of memory state:kubectl describe pod -n tca-system -l "control-plane=vmconfig-operator"Output will show vmconfig-operator pod
LastState: TerminatedReason: 00MKilled
Example:
1.9.5, 2.0.x
When vmconfig-operator is running in the management cluster and managing node customization for all the nodes in the workload clusters and at the time when vmconfig-operator does a reconcile it will get the machine status for all in memory causing golang to take a while to do a memory recycle.
Kubernetes terminates a pod ff the amount of memory used by a pod exceeds the allowed memory.
In turn, Telco Cloud Automation fails to apply the nodepolicy to the management cluster and no network function can be instantiated.
Resolved in TCA 2.1
./enlarge_vmconfig_mem.shExample:current cluster is cluster: <clusterName>deployment.apps/vmconfig-operator patched NAMESPACE NAME MemoryLimit tca-system vmconfig-operator 2Gi vmconfig-operator pod is not Running, will recheck after 3 sec vmconfig-operator pod is not Running, will recheck after 3 sec vmconfig-operator pod is not Running, will recheck after 3 sec vmconfig-operator pod is Running NAME READY STATUS RESTARTS AGE vmconfig-operator-849dddc645-24qr7 1/1 Running 0 13s Enlarge vmconfig-operator memory finished