Understanding the CPU ready values in the vSphere Client advanced performance charts

Products

VMware vSphere ESXi VMware vCenter Server

Issue/Introduction

The way CPU ready values are displayed in the vSphere Client performance charts is a bit difficult to understand. This article aims to provide explanations for what the charts actually show, and how it should be interpreted.

Environment

VMware vCenter Server 6.x

VMware vCenter Server 7.0.x

VMware vCenter Server 8.0.x

Resolution

What are the VM runtime states?

A powered on virtual machine spends its normal run time in 4 states:

Running (%RUN) - The VM is receiving the CPU resources it requires, it is currently placed in a physical CPU cycle.
Ready (%RDY) - The VM requires CPU resources, but none are currently available, so it has to wait for another CPU cycle.
Co-Stop (%CSTP) - The VM requires CPU resources, but there are not enough threads free in the current CPU cycle to place all of the virtual cores at once.
Waiting (%WAIT) - This is not actually a specific state, but rather a combination of others, mainly %IDLE (the time the VM doesn't require CPU resources) and %VMWAIT (the time the VM spent waiting on other, non-CPU, kernel resources to become available - usually storage or network).

These 4 states together make up 100 percent of the VMs time, as long as it is powered on:

%RUN + %RDY + %CSTP + %WAIT = 100 %

What is so special about CPU ready?

Other than many other performance counters, CPU ready is not a momentary value, but will always be an accumulation over time. As mentioned above, it shows how much time the virtual CPUs were in the state of waiting to be scheduled. In esxtop the %RDY counter shows a percentile value that is calculated based on the time since the last sample was taken.

In case of the performance charts in vSphere Client however, an absolute number in milliseconds is shown. This value means nothing in itself and can only be evaluated against the interval between the data points in the chart.

About intervals

None of the charts in the vSphere client shows seamless data. No matter which chart you are looking at, it will always have distances between the actual points of data, which is the statistics interval.

These intervals are different, depending on the range the chart is displaying.

When the chart is set to "Real-Time", the data points will be 20 seconds apart.
When looking at "Last Day", they will be once every 5 minutes, as configured in the vCenter Statistics settings
When looking at any data older than 24 hours, but younger than 7 days, the distance between the data points will be 30 minutes (again as configured in the vCenter Statistics settings), for data between 7 days and a month this will be 2 hours, and for data between 1 month and 1 year old the points in the chart will be 1 day apart of each other.

You can change these values, but always keep in mind that changing them to lower values will introduce a lot of additional strain to the performance of the vCenter database and thus on vCenter itself, and will increase the space requirements for the SEAT database massively. Lowering the interval durations is therefore not recommended.

Why does this matter?

Because the intervals are different, and because CPU ready is summed up over time, you cannot directly compare the value for this counter between different charts.

For example, let's say you look at "Real-Time" and see a value of 500 for CPU ready. Now you look at the "Last Day" chart, and suddenly the value is 7500 milliseconds.

Does this mean that the virtual CPUs spent more time in ready state yesterday than it does today?

No it does not, because right now the interval is 20 seconds, and for yesterday an interval of 5 minutes (aka 300 seconds) needs to be taken into account. To get comparable values, you need to look at the actual percentages instead.

For the value in "Real-Time" this is:

500 ms * 100 / 20,000 ms = 2.5 percent

For "Last Day" it is:

7500 ms * 100 / 300,000 ms = 2.5 percent

Although both charts are showing an equal situation, both times the virtual CPUs spent a 40th of a of their overall run time in ready state - nothing changed in the meantime. These are theoretical values, and in practice they will most likely be higher, but the point should be clear.

What else is important to know?

One other thing that has not been mentioned above, but which will matter for practical observations is the fact that both esxtop and the performance charts do show CPU ready and %RDY as a sum for all of the virtual cores of a VM.

Mathematically this provides good results when compared to the overall number of available threads on the physical CPUs in the hypervisor-host. For performance analyses it can be a bit misleading though, as it does not reflect the time the actual virtual machine spent ready. As ESXi (like other hypervisors) can only ever schedule all virtual CPUs of a virtual machine in the same processor-cycle of the physical CPU, you will need to divide the value shown in the chart by the number of virtual CPUs the VM has, to find out how much of its time the VM was waiting to be scheduled.

Going by the example above with the chart showing 500 ms in "Real-Time" mode, let us assume that the virtual machine has 4 virtual cores.

The math in this case would be:

(500 ms * 100 / 20,000 ms) / 4 = 0.625 percent

The virtual machine in this example spent 0.625 percent, or a 160th of its run time in ready state.

Additional Information

Please keep in mind that CPU ready or %RDY in itself is not an indicator for existing performance issues. A virtual machine can spend a lot of time in ready state without ever running into any slowness/bottlenecks or other performance problem. Therefore CPU ready should not be taken on its own, but instead should be used as a tool, or better one of the tools used when troubleshooting actual performance issues.