A large VM experiences a hang when a vMotion is initiated.
There are no error messages or crashes seen.
The vmware.log file for the vmware may contain entries such as:
YYYY-MM-DDTHH:mm:ss In(05) vmx - GuestRpcSendTimedOut: message to toolbox timed out.
You may see high %CSTP figures when reviewing esxtop. See Determining if multiple virtual CPUs are causing performance issues for further assistance with checking this.
ESXi 7
ESXi 8
Typically we see this with very large VMs on an overcommitted ESXi host. If a VM has too many vCPUs allocated, this can make it difficult to schedule the VM processes along with the ESXi processes.
For example, if an ESXi host has 64 physical cores with hyperthreading enabled, this allows you to allocate 128 vCPUs to one virtual machine. While this VM is able to run with 128 vCPUs, it is competing with ESXi processes for resources, leading to high %CSTP figures in esxtop.
When a vMotion is triggered, this process also requires resources to execute.
Reduce the number of vCPUs allocated to the VM. While hyperthreading allows you to allocate double the number of physical cores, our recommendation would be to instead allocate vCPUs based on an expected 30% increase in CPU resources. In the above example, this would mean allocating approximately 84 vCPUs to the VM rather than 128 (64 x 1.3 = 83.2).