Products

VMware vSphere ESXi

Issue/Introduction

Esxtop allows monitoring and collection of data for all system resources: CPU, memory, disk and network. When used interactively, this data can be viewed on different types of screens; one each for CPU statistics, memory statistics, network statistics and disk adapter statistics. In addition to the disk adapter statistics in earlier versions, starting with ESX3.5, disk statistics at the device and VM level are also available. Starting with ESX 4.0, esxtop has an interrupt statistics screen. In the batch mode, data can be redirected to a file for offline uses.

This article provides the last version of the "ESXTop Bible" pdf that explains each metric in detail.

Environment

ESXi 8.0.x

ESXi 7.0.x

Cause

Esxtop uses worlds and groups as the entities to show CPU usage. A world is an ESX Server VMkernel schedulable entity, similar to a process or thread in other operating systems. A group contains multiple worlds. Let's use a VM as an example. A powered-on VM has a corresponding group, which contains multiple worlds. In ESX, there is one vcpu (hypervisor) world corresponding to each VCPU of the VM. The guest activities are represented mostly by the vcpu worlds. (In ESX 3.5, esxtop shows a vmm world and a vcpu world for each VCPU. The guest activities are represented mostly by the vmm worlds.) Besides the vcpu worlds, there are other assisting worlds, such as a MKS world and a VMX world. The MKS world assists mouse/keyboard/screen virtualization. The VMX world assists the vcpu worlds (the hypervisor)

Resolution

ESXTop

To run EXSTop:

Open a SSH session to the host
Run esxtop on the host.
After this, you can use the following options to view the data:
- c: Switch to the CPU resource utilization screen.
- m: Switch to the Memory resource utilization screen.
- d: Switch to the Disk Adapter resource utilization screen.
- u: Switch to the Disk Device resource utilization screen.
- v: Switch to the Disk VM (virtual machine) resource utilization screen.
- n: Switch to the Network resource utilization screen.
- p: Display power management information.
- f: Open the field selection panel to add or remove metrics from the current view.
- o: Open the order selection panel to change the display order of statistics.
- h: Display the help screen.
- q: Quit esxtop.

If you would like to collect esxtop batch data for analysis:

Open a SSH session to the host
Run the following:
- esxtop -b -a -d <interval> -n <number of samples> > /vmfs/volumes/<datastore>/esxtopOut.csv
The batch mode options are:
- b : batch mode
- a : collect all metrics
- d : sampling frequency in seconds
- n : total number of samples

CPU

Esxtop uses worlds and groups as the entities to show CPU usage. A world is an ESX Server VMkernel schedulable entity, similar to a process or thread in other operating systems. A group contains multiple worlds.

Global Statistics

up time
- The elapsed time since the server has been powered on.
number of worlds
- The total number of worlds on ESX Server.
CPU load average
- The arithmetic mean of CPU loads in 1 minute, 5 minutes, and 15 minutes, based on 6-second samples. CPU load accounts the run time and ready time for all the groups on the host.
PCPU(%)
- The percentage CPU utilization per physical CPU.
used total
- Sum( PCPU(%) ) / number of PCPUs
LCPU(%)
- The percentage CPU utilization per logical CPU. The CPU used percentages for the logical CPUs belonging to a package add up to 100%. This line is displayed only if hyper-threading is present and enabled.
CCPU(%)
- Percentages of total CPU time as reported by the ESX Service Console. us is for percentage user time, sy is for percentage system time, id is for percentage idle time and wa is for percentage wait time. cs/sec is for the context switches per second recorded by the ESX Service Console.

World Statistics

A group statistics is the sum of world statistics for all the worlds contained in that group. So, this section focuses on worlds. You may apply the description to the group as well, unless stated otherwise.

%USED
- The percentage physical CPU time accounted to the world. If a system service runs on behalf of this world, the time spent by that service (i.e. %SYS) should be charged to this world. If not, the time spent (i.e. %OVRLP) should not be charged against this world. See notes on %SYS and %OVRLP.
- %USED = %RUN + %SYS - %OVRLP
%SYS
- The percentage of time spent by system services on behalf of the world. The possible system services are interrupt
  handlers, bottom halves, and system worlds
%OVRLP
- The percentage of time spent by system services on behalf of other worlds. In more detail, let's use an example. When World 'W1' is running, a system service 'S' interrupts 'W1' and services World 'W2'. The time spent by 'S', annotated as 't', is included in the run time of 'W1'. We use %OVRLP of 'W1' to show this time. This time 't' is accounted to %SYS of 'W2', as well
%RUN
- The percentage of total scheduled time for the world to run.
%RDY
- The percentage of time the world was ready to run. A world in a run queue is waiting for CPU scheduler to let it run on a PCPU. %RDY accounts the percentage of this time. So, it is always smaller than 100%.
%MLMTD
- The percentage of time the world was ready to run but deliberately wasn't scheduled because that would violate the CPU limit settings.
- Note that %MLMTD is included in %RDY
%CSTP
- The percentage of time the world spent in ready, co-deschedule state. This co-deschedule state is only meaningful for SMP VMs. Roughly speaking, ESX CPU scheduler deliberately puts a VCPU in this state, if this VCPU advances much farther than other VCPUs.
%WAIT
- The percentage of time the world spent in wait state. This %WAIT is the total wait time. I.e., the world is waiting for some VMKernel resource. This wait time includes I/O wait time, idle time and among other resources. Idle time is presented as %IDLE.
%IDLE
- The percentage of time the VCPU world is in idle loop. Note that %IDLE is included in %WAIT. Also note that %IDLE only makes sense to VCPU world. The other worlds do not have idle loops, so, %IDLE is zero for them.
%SWPWT
- The percentage of time the world is waiting for the ESX VMKernel swapping memory. The %SWPWT (swap wait) time is included in the %WAIT time.

Memory

Global Statistics

MEM overcommit avg
- Average memory overcommit level in 1-min, 5-min, 15-min (EWMA). Memory overcommit is the ratio of total requested memory and the managed memory minus 1. VMKernel computes the total requested memory as a sum of the following components: (a) VM configured memory (or memory limit setting if set), (b) the user world memory, (c) the reserved overhead memory.
PMEM (MB)
- The machine memory statistics for the host.
VMKMEM (MB)
- The machine memory statistics for VMKernel.
COSMEM (MB)
- The memory statistics reported by the ESX Service Console
NUMA (MB)
- The ESX NUMA statistics. For each NUMA node there are two statistics: (1) the total amount of machine memory managed by ESX; (2) the amount of machine memory currently free.
- Note that ESX NUMA scheduler optimizes the uses of NUMA feature to improve guest performance. Please refer to Resource Management Guide for details.
PSHARE (MB)
- The ESX page-sharing statistics
SWAP (MB)
- The ESX swap usage statistics.
MEMCTL (MB)
- The memory balloon statistics.

Group Statistics

Esxtop shows the groups that use memory managed by VMKernel memory scheduler. These groups can be used for VMs or purely for user worlds running directly on VMKernel.

Tip: use 'V' command to show only the VM groups.

MEMSZ (MB)
- For a VM, it is the amount of configured guest physical memory.
GRANT (MB)
- For a VM, it is the amount of guest physical memory granted to the group, i.e., mapped to machine memory. The overhead memory, OVHD is not included in GRANT. The shared memory, SHRD, is part of GRANT. The consumed machine memory for the VM, not including the overhead memory, can be estimated as GRANT - SHRDSVD. Please refer to SHRDSVD.
- For a user world, it is the amount of virtual memory that is backed by machine memory.
SZTGT (MB)
- The amount of machine memory to be allocated. (TGT is short for target.) Note that SZTGT includes the overhead memory for a VM. This is an internal counter, which is computed by ESX memory scheduler. Usually, there is no need to worry about this. Roughly speaking, SZTGT of all the VMs is computed based on the resource usage, available memory, and the limit/reservation/shares settings. This computed SZTGT is compared against the current memory consumption plus overhead memory for a VM to determine the swap and balloon target, so that VMKernel may balloon or swap appropriate amount of memory to meet its memory demand. Please refer to Resource Management Guide for details.
TCHD (MB)
- The amount of guest physical memory recently used by the VM, which is estimated by VMKernel statical sampling. VMKernel estimates active memory usage for a VM by sampling a random subset of the VM's memory resident in machine memory to detect the number of memory reads and writes. VMKernel then scales this number by the size of VM's configured memory and averages it with previous samples. Over time, this average will approximate the amount of active memory for the VM.
  - Note that ballooned memory is considered inactive, so, it is excluded from TCHD.
- Because sampling and averaging takes time, TCHD won't be exact, but becomes more accurate over time. VMKernel memory scheduler charges the VM by the sum of (1) the TCHD memory and (2) idle memory tax. This charged memory is one of the factors that memory scheduler uses for computing the SZTGT.
%ACTV
- Percentage of active guest physical memory, current value. TCHD is actually computed based on a few parameters, coming from statistical sampling. The exact equation is out of scope of this document.
%ACTVS
- Percentage of active guest physical memory, slow moving average. See above.
%ACTVF
- Percentage of active guest physical memory, fast moving average. See above.
%ACTVN
- Percentage of active guest physical memory in the near future. This is an estimated value. See above.
MCTL?
- Memory balloon driver is installed or not. If not, install VMware tools which contains the balloon driver.
MCTLSZ (MB)
- The amount of guest physical memory reclaimed by balloon driver. This can be called balloon size. A large MCTLSZ means lots of this VM's guest physical memory is stolen to decrease host memory pressure. This usually is not a problem, because balloon driver tends to smartly steal guest physical memory that cause little performance problems
MCTLTGT (MB)
- The amount of guest physical memory to be kept in balloon driver. (TGT is short for target.) This is an internal counter, which is computed by ESX memory scheduler. Usually, there is no need to worry about this.
MCTLMAX (MB)
- The maximum amount of guest physical memory reclaimable by balloon driver. This value can be set via vmx option sched.mem.maxmemctl. If not set, it is determined by the guest operating system type. MCTLTGT will never be larger than MCTLMAX. If the VM suffers from ballooning, sched.mem.maxmemctl can be set to a smaller value to reduce this possibility. Remember that doing so may result in host swapping during resource contention.
SWCUR (MB)
- Current swap usage. For a VM, it is the current amount of guest physical memory swapped out to the backing store. Note that it is the VMKernel swapping not the guest OS swapping.
- It is the sum of swap slots used in the vswp file or system swap, and migration swap. Migration swap is used for a VMotioned VM to hold swapped out memory on the destination host, in case the destination host is under memory pressure.
SWTGT (MB)
- The expected swap usage. (TGT is short for target.) This is an internal counter, which is computed by ESX memory scheduler. Usually, there is no need to worry about this.
SWR/s (MB)
- Rate at which memory is being swapped in from disk. Note that this stats refers to the VMKernel swapping not the guest swapping.
- When a VM is requesting machine memory to back its guest physical memory that was swapped out to disk, VMKernel reads in the page. Note that the swap-in operation is synchronous.
SWW/s (MB)
- Rate at which memory is being swapped out to disk. Note that this stats refers to the VMKernel swapping not the guest swapping.
SHRD (MB)
- Amount of guest physical memory that are shared. VMKernel page sharing module scans and finds guest physical pages with the same content and backs them with the same machine page. SHRD accounts the total guest physical pages that are shared by the page sharing
  module.
ZERO (MB)
- Amount of guest physical zero memory that are shared. Thisis an internal counter. A zero page is simply the memory page that is all zeros. If a zero guest physical page is detected by VMKernel page sharing module, this page will be backed by the same machine page on each NUMA node. Note that ZERO is included in SHRD.
SHRDSVD (MB)
- Estimated amount of machine memory that are saved due to page sharing. Because a machine page is shared by multiple guest physical pages, we only charge 1/ref page as the consumed machine memory for each of the guest physical pages, where ref is the number of references. So, the saved machine memory will be 1 - 1/ref page.SHRDSVD estimates the total saved machine memory for the VM.
- The consumed machine memory by the VM can be estimated as GRANT - SHRDSVD.
COWH (MB)
- Amount of guest physical hint pages for page sharing. This is an internal counter.
OVHDUW (MB)
- Amount of overhead memory reserved for the vmx user world of a VM group. This is an internal counter. OVHDUW is part of OVHDMAX.
OVHD (MB)
- Amount of overhead memory currently consumed by a VM. OVHD includes the overhead memory consumed by the monitor, the VMkernel and the vmx user world.
OVHDMAX (MB)
- Amount of reserved overhead memory for the entire VM.

Disk and Network Statistics

I/O Throughput Statistics

CMDS/s
- Number of commands issued per second.
READS/s
- Number of read commands issued per second.
WRITES/s
- Number of write commands issued per second.
MBREAD/s
- Megabytes read per second.
MBWRTN/s
- Megabytes written per second.

Latency Statistics

This group of counters report latency values measured at three different points in the ESX storage stack. In the context of the figure below, the latency counters in esxtop report the Guest, ESX Kernel and Device latencies. These are under the labels GAVG, KAVG and DAVG, respectively. Note that GAVG is the sum of DAVG and KAVG counters.

Note that esxtop shows the latency statistics for different objects, such as adapters, devices, paths, and worlds. They may not perfectly match with each other, since their latencies are measured at the different layers of the ESX storage stack. To do the correlation, you need to be very familiar with the storage layers in ESX Kernel, which is out of our scope. Latency values are reported for all IOs, read IOs and all write IOs. All values are averages over the measurement interval.

GAVG
- This is the round-trip latency that the guest sees for all IO requests sent to the virtual storage device. GAVG should be close to the R metric in the figure.
KAVG
- These counters track the latencies due to the ESX Kernel's command. The KAVG value should be very small in comparison to the DAVG value and should be close to zero. When there is a lot of queuing in ESX, KAVG can be as high, or even higher than DAVG. If this happens, please check the queue statistics, which will be discussed next.
DAVG
- This is the latency seen at the device driver level. It includes the roundtrip time between the HBA and the storage.
- DAVG is a good indicator of performance of the backend storage. If IO latencies are suspected to be causing performance problems, DAVG should be examined. Compare IO latencies with corresponding data from the storage array. If they are close, check the array for misconfiguration or faults. If not, compare DAVG with corresponding data from points in between the array and the ESX Server, e.g., FC switches. If this intermediate data also matches DAVG values, it is likely that the storage is under-configured for the application. Adding disk spindles or changing the RAID level may help in such cases.
QAVG
- The average queue latency. QAVG is part of KAVG. Response time is the sum of the time spent in queues in the storage stack and the service time spent by each resource in servicing the request. The largest component of the service time is the time spent in retrieving data from physical storage. If QAVG is high, another line of investigation is to examine the queue depths at each level in the storage stack.

Queue Statistics

AQLEN
- The storage adapter queue depth. This is the maximum number of ESX Server VMKernel active commands that the adapter driver is configured to support.
LQLEN
- The LUN queue depth. This is the maximum number of ESX Server VMKernel active commands that the LUN is allowed to have.
WQLEN
- The World queue depth. This is the maximum number of ESX Server VMKernel active commands that the World is allowed to have. Note that this is a per LUN maximum for the World.
ACTV
- The number of commands in the ESX Server VMKernel that are currently active. This statistic is only applicable to worlds and LUNs.
QUED
- The number of commands in the VMKernel that are currently queued. This statistic is only applicable to worlds and LUNs.
- Queued commands are commands waiting for an open slot in the queue. A large number of queued commands may be an indication that the storage system is overloaded. A sustained high value for the QUED counter signals a storage bottleneck which may be alleviated by increasing the queue depth. Check that LOAD < 1 after increasing the queue depth. This should also be accompanied by improved performance in terms of increased cmd/s. Note that there are queues in different storage layers. You might want to check the QUED stats for devices, and worlds,
%USD
- The percentage of queue depth used by ESX Server VMKernel active commands. This statistic is only applicable to worlds and LUNs.
  - %USD = ACTV / QLEN * 100%
- %USD is a measure of how many of the available command queue slots are in use. Sustained high values indicate the potential for queueing; you may need to adjust the queue depths for system’s HBAs if QUED is also found to be consistently > 1 at the same time. Queue sizes can be adjusted in a few places in the IO path and can be used to alleviate performance problems related to latency. For detailed information on this topic please refer to the VMware whitepaper entitled Scalable Storage Performance.
LOAD
- The ratio of the sum of VMKernel active commands and VMKernel queued commands to the queue depth. This statistic is only applicable to worlds and LUNs. The sum of the active and queued commands gives the total number of outstanding commands issued by that virtual machine. The LOAD counter values is the ratio of this value with respect to the queue depth. If LOAD > 1 check the value of the QUED counter.

Error Statistics

ABRTS/s
- The number of commands aborted per second. It can indicate that the storage system is unable to meet the demands of the guest operating system. Abort commands are issued by the guest when the storage system has not responded within an acceptable amount of time, e.g. 60 seconds on some windows OS’s. Also, resets issued by a guest OS on its virtual SCSI adapter will be translated to aborts of all the commands outstanding on that virtual SCSI adapter.
RESETS/s
- The number of commands reset per second

PAE and Split Statistics

PAECMD/s
- The number of PAE commands per second. It may point to hardware misconfiguration. When the guest allocates a buffer, the vmkernel assigns some machine memory, which might come from a “highmem” region. If you have a driver that is not PAE-aware, then this counter is updated if accesses to this memory region result in copies by the vmkernel into a lower memory location before issuing the request to the adapter. This might happen if you do not populate the DIMMs with low memory first, then you may artificially cause “highmem” memory accesses.
PAECP/s
- The number of PAE copies per second.
SPLTCMD/s
- The number of split commands per second. Commands can be split when they reach the vmkernel. This might impact perceived latency to the guest. The guest may be issuing commands of large block sizes which have to be broken down by the vmkernel. Splitting can also occur when IOs fall across partition boundaries but these are easily differentiated from the splitting as a result of the IO size.
SPLTCP/s
- The number of split copies per second

Port Statistics

SPEED (Mbps)
- The link speed in Megabits per second. This information is only valid for a physical NIC.
FDUPLX
- 'Y' implies the corresponding link is operating at full duplex. 'N' implies it is not. This information is only valid for a physical NIC.
UP
- 'Y' implies the corresponding link is up. 'N' implies it is not. This information is only valid for a physical NIC.
PKTTX/s
- The number of packets transmitted per second.
PKTRX/s
- The number of packets received per second.
MbTX/s (Mbps)
- The MegaBits transmitted per second.
MbRX/s (Mbps)
- The MegaBits received per second.
%DRPTX
- The percentage of transmit packets dropped.
ACTN/s
- Number of actions per second. The actions here are VMkernel actions. It is an internal counter. We won't discuss it further here.

Additional Information

The attachment has been public face via VMware communities. Download the attached pdf for more detailed information.

Using esxtop to identify storage performance issues for ESXi

Collecting esxtop batch data for ESXi performance troubleshooting

Attachments

ESXTop-Bible.DOC-9279.pdf get_app

Using ESXTop and Interpreting ESXTop Statistics

Article ID: 382249

Updated On:

Products

Issue/Introduction

Environment

Cause

Resolution

ESXTop

CPU

Global Statistics

World Statistics

Memory

Global Statistics

Group Statistics

Disk and Network Statistics

I/O Throughput Statistics

Latency Statistics

Queue Statistics

Error Statistics

PAE and Split Statistics

Port Statistics

Additional Information

Attachments

Feedback