Imbalanced CPU usage and increased contention on ESXi hosts with AMD EPYC CPUs
book
Article ID: 307072
calendar_today
Updated On:
Products
VMware vSphere ESXi
Issue/Introduction
Symptoms: On ESXi hosts with AMD EPYC Processors like Naples (Zen), you might experience the following symptoms:
Some ranges of PCPUs are highly utilized while others are not.
Increased Ready / CoStop time on some VMs.
Environment
VMware vSphere ESXi 6.5 VMware vSphere ESXi 6.7
Cause
This issue occurs when the CPU scheduler considers the relationship among scheduling contexts and places them closely within a single last-level cache. Such relationships are not limited to virtual CPUs, and can also be established with I/O contexts. This placement optimization minimizes inter LLC communication overhead in general cases.
However, on AMD EPYC processors, each physical NUMA node may consists of multiple last-level caches. In such cases, the scheduler may move contexts with the same relationship toward a subset of last-level cache(s) while leaving the other subset of last-level cache(s) in the same NUMA node relatively idle. This concentrated placement may cause non-negligible ready time in certain cases.