When monitoring vSphere performance, differentiating between VM-level and host-level CPU readiness is critical for identifying whether performance degradation is isolated to a single guest or indicative of broader host contention. This article clarifies the distinction between these two metrics and addresses how to interpret short-lived 100% spikes in vCenter performance graphs
VMware vSphere ESXi
CPU Readiness represents the time a virtual machine was ready to run but could not be scheduled on a physical CPU (pCPU).
vsphere.vm.cpu.readiness.average:
Definition: The average readiness across all vCPUs of a specific virtual machine.
Insight: Indicates if a specific VM is suffering from "ready time" delay. High values directly correlate to application-level slowness inside the guest, even if the host's overall performance appears healthy.
vsphere.host.cpu.readiness.average:
Definition: An aggregate metric measured per ESXi host, representing the average contention across all VMs running on that host.
Insight: Indicates overall host overcommitment. While a low host average generally suggests a healthy environment, it does not guarantee that every individual VM is performing optimally.
When observing transient spikes (e.g., narrow 100% spikes lasting only one data point) in vCenter real-time mode, consider the following analysis:
Data Sampling: In vCenter real-time charts, each data point represents a 20-second sample interval. A 100% spike indicates that during that specific 20-second window, every vCPU on the host was waiting for a physical CPU.
Pattern Analysis: If spikes are non-sustained and drop back to near zero immediately, it does not typically indicate a permanent capacity issue. This pattern is often consistent with:
Burst Events: Short-lived "CPU storms" caused by runaway processes, snapshot operations, or a wave of vMotion migrations.
Scheduler Anomalies: Brief resource contention spikes as the ESXi CPU scheduler manages high-demand workloads.
Collection Anomalies: Occasional data collection gaps where a missed sample may cause an inflated rollup value in the UI.
Recommendation: Monitor the vsphere.vm.cpu.readiness.average for the most critical workloads. If host-level spikes become sustained (lasting multiple minutes), evaluate host density or move high-demand VMs to less congested hosts.