VMware vSphere ESXi
VMware vSphere ESXi
This can happen when the guest application needs more compute capacity than the offered by one NUMA node, but it fails to distribute the load over the additional NUMA nodes properly.
In order to confirm if that was the case, follow the steps below:
5675328 5675328 vmx-vcpu-0:TestVM 255 255 255 255 252 252 699310930 13321
5675333 5675328 vmx-vcpu-1:TestVM 255 255 255 255 253 253 622497743 11732
5675334 5675328 vmx-vcpu-2:TestVM 255 255 255 255 253 253 644270074 12987
5675335 5675328 vmx-vcpu-3:TestVM 255 254 254 255 255 255 599090063 16543
5675336 5675328 vmx-vcpu-4:TestVM 255 255 255 255 255 253 670941131 10552
5675337 5675328 vmx-vcpu-5:TestVM 255 255 249 255 255 250 43496715 20556
5675338 5675328 vmx-vcpu-6:TestVM 252 255 254 255 253 255 546591310 11800
5675339 5675328 vmx-vcpu-7:TestVM 255 255 254 255 255 252 576277152 11267
5675340 5675328 vmx-vcpu-8:TestVM 255 255 254 255 252 255 509256849 17341
5675341 5675328 vmx-vcpu-9:TestVM 255 255 254 255 255 252 484631494 17721
5675342 5675328 vmx-vcpu-10:TestVM 255 254 249 255 253 249 34721997 18293
5675343 5675328 vmx-vcpu-11:TestVM 255 255 254 255 253 255 494079871 20189
5675344 5675328 vmx-vcpu-12:TestVM 255 255 247 255 255 249 40676083 20240
5675345 5675328 vmx-vcpu-13:TestVM 255 255 255 255 255 255 681254271
nodeID | used | idle | entitled | owed | loadAvgPct | nVcpu | freeMem | totalMem |
0 | 2770 | 25230 | 0 | 0 | 0 | 0 | 196318932 | 201161092 |
1 | 27783 | 218 | 27009 | 0 | 96 | 14 | 181838524 | 201326592 |
6. With 16 vCPU, the 8 vCPUs on the first NUMA node are quite busy, while the 8 vCPUs on the second NUMA node are mostly staying idle:
2121608 2121608 vmx-vcpu-0:TestVM 250 249 240 253 251 243 5435254 12552
2121613 2121608 vmx-vcpu-1:TestVM 251 249 237 253 251 238 5349609 14108
2121614 2121608 vmx-vcpu-2:TestVM 250 245 247 255 250 248 5879843 11768
2121615 2121608 vmx-vcpu-3:TestVM 249 244 237 250 246 238 3528458 13786
2121616 2121608 vmx-vcpu-4:TestVM 236 238 227 241 236 227 3348875 19971
2121617 2121608 vmx-vcpu-5:TestVM 233 236 235 237 238 237 3254107 28640
2121618 2121608 vmx-vcpu-6:TestVM 232 241 225 238 241 226 3583011 15953
2121619 2121608 vmx-vcpu-7:TestVM 245 247 242 245 245 245 4113962 20198
2121620 2121608 vmx-vcpu-8:TestVM 50 43 40 40 35 42 104658 78247
2121621 2121608 vmx-vcpu-9:TestVM 30 37 42 37 34 40 362357 91945
2121622 2121608 vmx-vcpu-10:TestVM 3 5 21 14 12 23 149275 116822
2121623 2121608 vmx-vcpu-11:TestVM 1 5 6 11 5 6 152050 853567
2121624 2121608 vmx-vcpu-12:TestVM 0 1 6 6 4 5 362251 2245403
2121625 2121608 vmx-vcpu-13:TestVM 5 0 3 11 2 5 422925 3537338
2121626 2121608 vmx-vcpu-14:TestVM 2 0 3 7 2 3 302333 2874573
2121627 2121608 vmx-vcpu-15:TestVM 1 1 2 12 4 5 817400 3382363
Engage the application vendor to investigate why the workload isn't being evenly distributed.
Workaround:
To workaround this, configure the VM to use one virtual NUMA node with 16 vCPUs by faking the virtual topology and not having it match the underlying hardware.
As the VM isn't aware that its 16 vCPUs are running on 2 different physical NUMA modes, it can potentially place a workload on vCPUs that are far apart.
It's a trade-off that needs to happen if the workload cannot be distributed evenly by the OS; it's either optimized for the compute capacity (with this proposed setting), or for locality (without the proposed setting).
Another way of placing all vCPUs on the same NUMA node would be to use hyper-threading with NUMA. For more, see Configure virtual machines to use hyper-threading with NUMA in VMware ESXi
Note: In this case where the data showed that workloads are compute-intensive, sharing hyper-threads may not be optimal.
Steps: