Users experiencing poor storage performance despite minimal load.
search cancel

Users experiencing poor storage performance despite minimal load.

book

Article ID: 392647

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

Symptoms:

  • Users report poor storage performance despite minimal load on the devices. The users expect the device latency to be less than 3ms.

  • Storage connectivity is via software iSCSI.

  • Analyzing the esxtop metric reveals that the device latency is less than 5ms. To analyze "esxtop" metrics follow the below steps:
  • SSH to the ESXi host and launch esxtop by running the command 'esxtop' and press "u" to view storage device statistics. Confirm the following latency thresholds:
     • DAVG (Device Average Latency) < 30ms
     • KAVG (Kernel Average Latency) < 2ms

    Validation Steps:
  • Ensure the MTU size for both the VMKernel ports and virtual switches used for iSCSI is consistent (either 1500 or 9000).

  • Use the following command to check for packet loss:
    vmkping -I <vmk#> -s <MTU> -d <iscsi_target_IP>

  • Verify that the physical network adapter's driver and firmware are compatible.
    Refer to the VMware article: “Determining Network/Storage firmware and driver version in ESXi” for validation steps.  

Environment

VMware vSphere ESXi 7.x
VMware vSphere ESXi 8.x

Cause

Although latency under 5ms is generally within acceptable performance limits as per VMware standards, the vmkernel.log indicates command retries caused by transient storage conditions. The host returns H:0xc (Host Status 0xC), which signifies that the ESXi host is requeuing I/O commands due to temporary storage unavailability or delays.

This condition can degrade perceived performance despite low latency metrics, as the I/O operations are delayed and retried multiple times.

Cause Validation

Reviewing the vmkernel.log reveals multiple entries confirming command retries due to transient storage conditions.

 

YYYY-MM-DDTHH:MM:SS In(182) vmkernel: cpu4:2097302)ScsiDeviceIO: 4580: Cmd(0x45bb406a9f80) 0x8a, CmdSN 0x3ad from world 93812680 to dev "naa.#################" failed H:0xc D:0x0 P:0x0 >> This status is returned due to a transient error. When this status is returned, the I/O command is requeued and issued again.
YYYY-MM-DDTHH:MM:SS In(182) vmkernel: cpu20:2097318)ScsiDeviceIO: 4580: Cmd(0x45bb4073a000) 0x8a, CmdSN 0x312 from world 93812680 to dev "naa.#################" failed H:0xc D:0x0 P:0x0
YYYY-MM-DDTHH:MM:SS In(182) vmkernel: cpu82:2097374)ScsiDeviceIO: 4580: Cmd(0x45bafc42ee00) 0x8a, CmdSN 0x3de from world 93812680 to dev "naa.#################" failed H:0xc D:0x0 P:0x0
YYYY-MM-DDTHH:MM:SS In(182) vmkernel: cpu82:2097374)ScsiDeviceIO: 4580: Cmd(0x45bafc490280) 0x8a, CmdSN 0x3bc from world 93812680 to dev "naa.#################" failed H:0xc D:0x0 P:0x0

These logs confirm the I/O commands to the specified device were retried due to transient conditions, causing performance degradation despite healthy latency metrics.

Resolution

To address the transient storage condition and improve overall storage performance, perform the following steps:

  • Review storage array logs and health status to check for intermittent availability or performance degradation.
  • Work with the storage vendor to identify any underlying issues such as firmware bugs, controller failovers, or congestion.
  • Ensure there are no intermittent connectivity issues between the ESXi host and the storage array.
  • Check for interface flapping, CRC errors, or packet drops on physical switches and network adapters.