Understanding vsphere.host.cpu.readiness.average vs. vsphere.vm.cpu.readiness.average
search cancel

Understanding vsphere.host.cpu.readiness.average vs. vsphere.vm.cpu.readiness.average

book

Article ID: 432752

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

When monitoring vSphere performance, differentiating between VM-level and host-level CPU readiness is critical for identifying whether performance degradation is isolated to a single guest or indicative of broader host contention. This article clarifies the distinction between these two metrics and addresses how to interpret short-lived 100% spikes in vCenter performance graphs

Environment

VMware vSphere ESXi

Resolution

CPU Readiness represents the time a virtual machine was ready to run but could not be scheduled on a physical CPU (pCPU).

  • vsphere.vm.cpu.readiness.average:

    • Definition: The average readiness across all vCPUs of a specific virtual machine.

    • Insight: Indicates if a specific VM is suffering from "ready time" delay. High values directly correlate to application-level slowness inside the guest, even if the host's overall performance appears healthy.

  • vsphere.host.cpu.readiness.average:

    • Definition: An aggregate metric measured per ESXi host, representing the average contention across all VMs running on that host.

    • Insight: Indicates overall host overcommitment. While a low host average generally suggests a healthy environment, it does not guarantee that every individual VM is performing optimally.

When observing transient spikes (e.g., narrow 100% spikes lasting only one data point) in vCenter real-time mode, consider the following analysis:

  1. Data Sampling: In vCenter real-time charts, each data point represents a 20-second sample interval. A 100% spike indicates that during that specific 20-second window, every vCPU on the host was waiting for a physical CPU.

  2. Pattern Analysis: If spikes are non-sustained and drop back to near zero immediately, it does not typically indicate a permanent capacity issue. This pattern is often consistent with:

    • Burst Events: Short-lived "CPU storms" caused by runaway processes, snapshot operations, or a wave of vMotion migrations.

    • Scheduler Anomalies: Brief resource contention spikes as the ESXi CPU scheduler manages high-demand workloads.

    • Collection Anomalies: Occasional data collection gaps where a missed sample may cause an inflated rollup value in the UI.

Recommendation: Monitor the vsphere.vm.cpu.readiness.average for the most critical workloads. If host-level spikes become sustained (lasting multiple minutes), evaluate host density or move high-demand VMs to less congested hosts.