Users report poor storage performance despite minimal load on the devices. The users expect the device latency to be less than 3ms.
Storage connectivity is via software iSCSI.
esxtop' and press "u" to view storage device statistics. Confirm the following latency thresholds:Ensure the MTU size for both the VMKernel ports and virtual switches used for iSCSI is consistent (either 1500 or 9000).
Use the following command to check for packet loss:vmkping -I <vmk#> -s <MTU> -d <iscsi_target_IP>
Although latency under 5ms is generally within acceptable performance limits as per VMware standards, the vmkernel.log indicates command retries caused by transient storage conditions. The host returns H:0xc (Host Status 0xC), which signifies that the ESXi host is requeuing I/O commands due to temporary storage unavailability or delays.
This condition can degrade perceived performance despite low latency metrics, as the I/O operations are delayed and retried multiple times.
Reviewing the vmkernel.log reveals multiple entries confirming command retries due to transient storage conditions.
YYYY-MM-DDTHH:MM:SSIn(182) vmkernel: cpu4:2097302)ScsiDeviceIO: 4580: Cmd(0x45bb406a9f80) 0x8a, CmdSN 0x3ad from world 93812680 to dev "naa.#################" failed H:0xc D:0x0 P:0x0 >> This status is returned due to a transient error. When this status is returned, the I/O command is requeued and issued again.YYYY-MM-DDTHH:MM:SS In(182) vmkernel: cpu20:2097318)ScsiDeviceIO: 4580: Cmd(0x45bb4073a000) 0x8a, CmdSN 0x312 from world 93812680 to dev "naa.#################" failed H:0xc D:0x0 P:0x0YYYY-MM-DDTHH:MM:SS In(182) vmkernel: cpu82:2097374)ScsiDeviceIO: 4580: Cmd(0x45bafc42ee00) 0x8a, CmdSN 0x3de from world 93812680 to dev "naa.#################" failed H:0xc D:0x0 P:0x0YYYY-MM-DDTHH:MM:SS In(182) vmkernel: cpu82:2097374)ScsiDeviceIO: 4580: Cmd(0x45bafc490280) 0x8a, CmdSN 0x3bc from world 93812680 to dev "naa.#################" failed H:0xc D:0x0 P:0x0
These logs confirm the I/O commands to the specified device were retried due to transient conditions, causing performance degradation despite healthy latency metrics.
To address the transient storage condition and improve overall storage performance, perform the following steps: