This is meant to be shared with customers running a large number of VM workloads who are experiencing issues realizing those workloads with VM Operator due to the pod crashing due to insufficient memory allocated to the pod.
Symptoms:
If VirtualMachine resources are taking a long time to be realized when at scale (ex. thousands of VMs), then it could be due to the VM Operator pod crashing due to insufficient memory. To determine if this is the case, please follow these steps:
1. SSH into the vCenter appliance:
2. Follow KB 90194 to ssh into the Supervisor Control Plane VM as root.
3. Check to see if VM Operator has crashed due to an Out of Memory (OOM) error:
If OOMKilled is in the output from the above command, then the pod was terminated due to lack of sufficient memory.
In vSphere 8.0 U2c the VM Operator pod's memory limit is increased to 500Mi.
It is possible user workloads will still go over the 500Mi limit. If OOM errors are still encountered on later revisions of vSphere with Tanzu, it maybe be required to increase the limit greater than 500Mi, this will be environmentally dependent. Follow the workaround below to increase the limit accordingly.
Workaround:
The memory limit of the VM Operator pod can be increased. The following steps describe how to increase the limit to 500Mi (MB) or higher as needed:
1. SSH into the vCenter appliance:
2. Follow 90194 to ssh into the Supervisor Control Plane VM as root.
3. Scale down the VM Operator deployment so no pods are running:
4. Increase the memory limit to 500MB (or greater if required) for the VM Operator pod using the following command:
5. Scale the replicas back to their original value:
Everything should now work correctly. If OOM errors are still encountered, increase the memory limit in increments of 50Mi until the VMOP pods run normally.