Virtual Machines with preallocated memory might be placed NUMA remote at random power-ons

search cancel

Virtual Machines with preallocated memory might be placed NUMA remote at random power-ons

book

Article ID: 326198

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

Symptoms:
You might notice a seemingly random difference in performance whenever powering on virtual machines with at least one of these options:

PCI passthrough devices (DirectPath IO or SR-IOV)
Fault Tolerance (FT) enabled
Latency Sensitivity set to High
vGPU enabled

Affected virtual machines will have either 0% or very little NUMA local memory.

In esxtop's memory (m) view and with the fields (f) for NUMA STATS (g) enabled, you can confirm this by monitoring "N%L":

 9:43:04am up 5 days 36 min, 538 worlds, 1 VMs, 4 vCPUs; MEM overcommit avg: 0.00, 0.00, 0.00
PMEM  /MB: 131026   total: 2160     vmk,1185 other, 127680 free
VMKMEM/MB: 130640 managed:  1920 minfree,  7048 rsvd, 123592 ursvd,  high statej
NUMA  /MB: 65488 (63845), 65536 (63450)
PSHARE/MB:      27  shared,      27  common:       0 saving
SWAP  /MB:       0    curr,       0 rclmtgt:                 0.00 r/s,   0.00 w/s
ZIP   /MB:       0  zipped,       0   saved
MEMCTL/MB:       0    curr,       0  target,       0 max

     GID NAME                                         NHN NMIG    NRMEM    NLMEM N%L GST_ND0  OVD_ND0  GST_ND1  OVD_ND1
 7656943 VMNAME                                         0    0  1024.00     0.00   0     0.00    12.09  1024.00     1.49
(...)

Alternatively on the ESXi CLI, use "sched-stats". Here look for the "currLocal%" and "cummLocal%" columns, the later being the cumulative percentage of memory locality since power on:

# sched-stats -t numa-clients
 
 groupName           groupID    clientID    homeNode    affinity     nWorlds   vmmWorlds    localMem   remoteMem  currLocal%  cummLocal%
 vm.1194167          7656943           0           0         0xf           6           6           0    33554432           0           0
(...)

In most cases, you will either see none or very little NUMA migrations for the affected VMs (*Mig columns)

# sched-stats -t numa-migration

 groupName           groupID    clientID  balanceMig     loadMig localityMig longTermMig  monitorMig pageMigRate
 vm.1194167          7656943           0           0           0           0           0           0           0
(...)

To map the groupID to the virtual machine's display name, either use esxtop (GID and NAME column) or "esxcli vm process list" and match "VMX Cartel ID:" to "groupName" without the "vm.".

Environment

VMware vSphere ESXi 6.5

Cause

This is caused by an issue where the virtual machine memory can be pinned before the NUMA locality is decided in the initial placement and power-on phase.

Resolution

This issue is resolved in ESXi 6.5 U3

Workaround:
To workaround this issue you can use "numa.NodeAffinity" to force locality. Note that you have to be at least on ESXi 6.5 U1.

Feedback

thumb_up Yes

thumb_down No