CPU ready spikes for latency-sensitive VMs on ESXi 7.0.x when the sidechannel-aware scheduler is enabled

search cancel

CPU ready spikes for latency-sensitive VMs on ESXi 7.0.x when the sidechannel-aware scheduler is enabled

book

Article ID: 427635

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

Latency sensitive virtual machines running on ESXi 7.0.x, that have both CPU reservations and CPU limits configured, are experiencing occasional CPU ready spikes after you enable the sidechannel-aware scheduler.
These spikes can be seen in the vSphere Client advanced performance chart for CPU ready, but are also traceable with ESXTOP directly on the ESXi.

Environment

VMware vSphere ESXi 7.0.x

Cause

This issue can occur because CPU limits do not only affect the configured virtual CPUs of the virtual machine, but also limit the amount of CPU resources the additional virtual devices require.
This includes the virtual storage controller, virtual network adapters, the virtual GPU, and any other virtual device configured for the specific VM.
The side channel aware scheduler prevents the virtual machine from sharing physical cores with other VMs, thus the above configuration can cause occasional scheduling delays, which can be seen as CPU ready times.

Resolution

The scheduler has been improved in vSphere 8.0. If you are experiencing similar issues, please upgrade to ESXi 8.0 U3 or later.

If an upgrade is not possible, in order to prevent the virtual machine from experiencing occasional CPU ready, you should refrain from configuring a CPU limit.
Alternatively you can disable the sidechannel-aware scheduler by using the specific settings outlined in Implementing Hypervisor-Specific Mitigations for Microarchitectural Data Sampling (MDS) Vulnerabilities (CVE-2018-12126, CVE-2018-12127, CVE-2018-12130, and CVE-2019-11091) in vSphere.

Note: Please be aware that by disabling the sidechannel-aware scheduler on an ESXi host with a CPU type vulnerable against sidechannel attacks, like the Microarchitectural Data Sampling (MDS) Vulnerabilities identified by CVE-2018-12126, CVE-2018-12127, CVE-2018-12130, and CVE-2019-11091 and the L1 Terminal Fault vulnerability CVE-2018-3646 among others), the system will be potentially vulnerable against such attacks. This option should therefore only be chosen in completely isolated environment.

If neither of these options can be implemented, you can reduce the amount of CPU ready spikes by increasing the CPU limit configured on the VM. The amount by which the limit needs to be increased in order to achieve a feasible reduction will vary depending on the specific environment, the sizing of the VMs and the quality and quantity of the workload running in the VM. For smaller VMs with e.g. 4 cores, it can be enough to add the capacity of 2 additional cores to the CPU limit, for other VM sizes the amount will be different though.

Additional Information

VMware response to ‘L1 Terminal Fault - VMM’ (L1TF - VMM) Speculative-Execution vulnerability in Intel processors for vSphere: CVE-2018-3646

Implementing Hypervisor-Specific Mitigations for Microarchitectural Data Sampling (MDS) Vulnerabilities (CVE-2018-12126, CVE-2018-12127, CVE-2018-12130, and CVE-2019-11091) in vSphere

Feedback

thumb_up Yes

thumb_down No