Using esxtop to identify storage performance issues for ESXi (multiple versions)
book
Article ID: 344099
calendar_today
Updated On:
Products
VMware vSphere ESXi
Issue/Introduction
This article provides information about esxtop and latency statistics that can be used when troubleshooting performance issues with SAN-connected storage (Fibre Channel or iSCSI).
Note: If the number of lines displayed exceeds the window size you can press 2 to select and navigate between the lines you wish to remove. Press 4 on a highlighted line to remove it.
Analyzing esxtop columns
Refer to this table for relevant columns and descriptions of these values:
Column
Description
CMDS/s
This is the total amount of commands per second and includes IOPS (Input/Output Operations Per Second) and other SCSI commands such as SCSI reservations, locks, vendor string requests, unit attention commands etc. being sent to or coming from the device or virtual machine being monitored.
In most cases, CMDS/s = IOPS unless there are a lot of metadata operations (such as SCSI reservations)
DAVG/cmd
This is the average response time in milliseconds per command being sent to the device.
KAVG/cmd
This is the amount of time the command spends in the VMkernel.
GAVG/cmd
This is the response time as it is perceived by the guest operating system. This number is calculated with the formula: DAVG + KAVG = GAVG
These columns are for both reads and writes, whereas xAVG/rd is for reads and xAVG/wr is for writes. The combined value of these columns is the best way to monitor performance, but high read or write response time it may indicate that the read or write cache is disabled on the array. All arrays perform differently, however, DAVG/cmd, KAVG/cmd, and GAVG/cmd should not exceed more than 10 milliseconds (ms) for sustained periods of time.
If you experience high latency times, investigate current performance metrics and running configuration for the switches and the SAN targets. Check for errors or logging that may suggest a delay in operations being sent to, received, and acknowledged. This includes the array's ability to process I/O from a spindle count aspect, or the array's ability to handle the load presented to it.
If the response time increases to over 5000 ms (or 5 seconds), VMware ESX will time out the command and abort the operation. These events are logged; abort messages and other SCSI errors can be reviewed in these logs:
ESXi 6.x and later - /var/log/vmkernel.log
The type of storage logging you may see in these files depends on the configuration of the server. You can find the value of these options by navigating to Host > Configuration > Advanced Settings > SCSI > SCSI.Log* or SCSI.Print*.