Understanding the CPU ready values in the vSphere Client advanced performance charts
book
Article ID: 387750
calendar_today
Updated On:
Products
VMware vSphere ESXiVMware vCenter Server
Issue/Introduction
The default representation of CPU Ready values in the vSphere Client performance charts can be complex to interpret and often requires manual conversion for accurate analysis. This article explains the data presented in the charts and how to correctly interpret it.
Environment
VMware vCenter Server 6.x VMware vCenter Server 7.x VMware vCenter Server 8.x
Resolution
What are the virtual machine runtime states?
A powered on virtual machine spends its normal run time in 4 states:
Running (%RUN) - The virtual machine is receiving the CPU resources it requires, it is currently placed in a physical CPU cycle.
Ready (%RDY) - The virtual machine requires CPU resources, but none are currently available, so it has to wait for another CPU cycle.
Co-Stop (%CSTP) - The virtual machine requires CPU resources, but there are not enough threads free in the current CPU cycle to place all of the virtual cores at once.
Waiting (%WAIT) - This is not actually a specific state, but rather a combination of others, mainly %IDLE (the time the virtual machine doesn't require CPU resources) and %VMWAIT (the time the virtual machine spent waiting on other, non-CPU, kernel resources to become available - usually storage or network).
These 4 states together make up 100% of the virtual machine's time, as long as it is powered on: %RUN + %RDY + %CSTP + %WAIT = 100%
What is so special about CPU ready?
Unlike many other performance counters, CPU ready is not a momentary value, but will always be an accumulation over time. As mentioned above, it shows how much time the virtual CPUs were in the state of waiting to be scheduled. In esxtop the %RDY counter shows a percentile value that is calculated based on the time since the last sample was taken.
In the vSphere Client performance charts, this value is represented as an absolute number in milliseconds. This standalone figure lacks context and must be evaluated relative to the interval between data points on the chart.
About intervals
None of the charts in the vSphere Client display seamless data. Every chart features a specific distance between actual data points, known as the statistics interval.
These intervals are different, depending on the range the chart is displaying.
When the chart is set to "Real-Time", the data points will be 20 seconds apart.
When looking at "Last Day", they will be once every 5 minutes, as configured in the vCenter Statistics settings.
When looking at any data older than 24 hours, but less than 7 days, the distance between the data points will be 30 minutes (again as configured in the vCenter Statistics settings), for data between 7 days and a month this will be 2 hours, and for data between 1 month and 1 year old, the points in the chart will be spaced 1 day apart.
These values can be modified, but always keep in mind that changing them to lower values will introduce a lot of additional strain to the performance of the vCenter database and thus on vCenter itself, and will increase the space requirements for the SEAT database massively. Lowering the interval durations is therefore not recommended.
Why does this matter?
Due to the differing intervals, and because CPU ready is accumulated over time, the values for this counter cannot be directly compared across different charts.
For example, a CPU ready value of 500 may appear in the “Real-Time” view, while the “Last Day” chart could show a value of 7500 milliseconds for the same metric.
Does this mean that the virtual CPUs spent more time in ready state yesterday than it does today?
No, it does not, because right now the interval is 20 seconds, and for yesterday an interval of 5 minutes (aka 300 seconds) needs to be taken into account. To obtain comparable values, the actual percentages must be used instead.
For the value in "Real-Time" this is:
500ms * 100 / 20,000ms = 2.5%
For "Last Day" it is:
7500ms * 100 / 300,000ms = 2.5%
Although both charts are showing an equal situation, both times the virtual CPUs spent a 40th of their overall run time in ready state - nothing changed in the meantime. These are theoretical values, and in practice they will most likely be higher, but the point should be clear.
What else is important to know?
One other thing that has not been mentioned above, but which will matter for practical observations is the fact that both esxtop and the performance charts do show CPU ready and %RDY as the sum of all virtual cores of a virtual machine.
From a mathematical perspective, this produces accurate results when compared against the total number of available threads on the physical CPUs of the hypervisor host. However, for performance analysis, it can be somewhat misleading because it does not represent the actual amount of time the virtual machine spent in a ready state. Since ESXi, like other hypervisors, can schedule all virtual CPUs of a virtual machine only within the same physical CPU cycle, the value displayed in the chart must be divided by the number of virtual CPUs assigned to the virtual machine to determine how long the virtual machine was waiting to be scheduled.
Going by the example above with the chart showing 500ms in "Real-Time" mode, let us assume that the virtual machine has 4 virtual cores.
The math in this case would be:
(500ms * 100 / 20,000ms) / 4 = 0.625%
The virtual machine in this example spent 0.625%, or a 160th of its run time in ready state.
Additional Information
The CPU ready or %RDY in itself is not an indicator for existing performance issues. A virtual machine can spend a lot of time in ready state without ever running into any slowness/bottlenecks or other performance problem. Therefore CPU ready should not be taken on its own, but instead should be used as a tool, or better one of the tools used when troubleshooting actual performance issues.