This occurs due to the architecture of the VMkernel NUMA scheduler and is dependant on the number of CPU cores that exist per NUMA node. If the virtual machine has a higher number of vCPUs than the number of cores in the NUMA node, then the virtual machine is not managed by the NUMA Scheduler and there are no benefits from NUMA locality. These virtual machines can therefore see higher memory access latencies.
This is expected behavior based on the current architecture of the scheduler. However, these points help to limit the impact of the issue:
- Ensure that the virtual machine is configured to utilize a number of CPUs less than or equal to the number of cores per NUMA node. In ESX 4.0, if the number of vCPUs exceeds the size of a node, the NUMA scheduler does not manage the virtual machine. As long as the virtual machines are sized such that they are a whole multiple or divisor of the NUMA node size this will help with the number of virtual machines that we can power on and they will benefit from memory locality. For example, if 6 vCPU virtual machines are used when there are 6 cores assigned to each NUMA node, you can run up to at least 8 of those virtual machines (with 100% CPU utilization) without incurring substantial ready times on a 48 core server.
Note: In a DRS cluster, the virtual machines need to be sized appropriately for the whole cluster, as the virtual machines can be migrated between the hosts in the cluster. To make the sizing easier, it is a good idea to have systems with the same NUMA characteristics (mainly the number of cores per NUMA node) in the cluster.
- Spread out the large vCPU virtual machines across the environment. The impact is reduced as there is less per host. The administrator should also monitor or utilize DRS to ensure that the distribution of large vCPU virtual machines is constant.
- Size the virtual machines appropriately and ensure they have not been configured with an excessive number of vCPUs. If the application running in the system is not going to benefit from having multiple vCPUs, then do not configure them.
-
Disable NUMA. Disabling NUMA resolves the CPU ready time issue, but should be used as a last resort as the inter-node latencies among certain nodes may be high. Disabling NUMA is discouraged on servers with many NUMA nodes and the latency between the nodes might be significant. NUMA can be disabled by enabling Node Interleaving in the BIOS of the ESX host. High ready times are not seen in this configuration because the scheduler no longer takes NUMA locality into account, which means that there are no restrictions on where the virtual machines can run. When NUMA is enabled, the high ready times are as a result of trying to schedule the virtual machines to their local node to prevent any performance hit from remote locality.
Note: In internal performance evaluations, VMware has observed high performance degradation on disabling NUMA, particularly when memory intensive applications are running in the virtual machines.