VM reports a spike in response time
search cancel

VM reports a spike in response time

book

Article ID: 392010

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

Symptoms : 

  • VM reports a spike in response time.

Environment

VMware vSphere ESXi 6.x
VMware vSphere ESXi 7.x
VMware vSphere ESXi 8.x

Cause

A spike in VM response time is because of the high DAVG values reported on storage device. This could be because of sudden spike in the I/O volume.

Cause validation:

  • The /var/run/log/vmkernel.log file indicates write aborts for storage device
    YYYY-MM-DDTHH:MM.SSSZ cpu4:2567223)PVSCSI: 2698: scsi2:48: SCSI ABORT ctx=0x1fd
    YYYY-MM-DDTHH:MM.SSSZ cpu4:2567223)qlnativefc: vmhba1(3a:0.1): qlnativefcTaskMgmt:2303:Task Mgmt abort on serial num 1fd
    YYYY-MM-DDTHH:MM.SSSZ cpu4:2567223)qlnativefc: vmhba1(3a:0.1): qlnativefcEhAbort:2745:SCSI command timeout counter incremented to 4
    YYYY-MM-DDTHH:MM.SSSZ cpu4:2567223)qlnativefc: vmhba1(3a:0.1): qlnativefcEhAbort:2750:qlnativefcEhAbort: aborting sp 0x45ea59c1dd00 handle 302 from RISC. serialNumber=1fd, Command timeout=47822 sec.
    YYYY-MM-DDTHH:MM.SSSZ cpu4:2567223)qlnativefc: vmhba1(3a:0.1): qlnativefcEhAbort:2763:qlnativefcEhAbort: abortCommand mbx success.
    YYYY-MM-DDTHH:MM.SSSZ cpu91:2098572)qlnativefc: vmhba1(3a:0.1): qlnativefcStatusEntry:2077:C0:T23:L102 - FCP command status: 0x5-0x0 (0x8) portid=3e0e21 oxid=0x63 cdb=2a0000 len=14336 rspInfo=0x0 resid=0x0 fwResid=0x0 host status = 0x8 device status $
    YYYY-MM-DDTHH:MM.SSSZ cpu91:2098572)NMP: nmp_ThrottleLogForDevice:3867: Cmd 0x2a (0x45ea6f60a108, 2567217) to dev "naa.################################" on path "vmhba1:C0:T23:L102" Failed:
    YYYY-MM-DDTHH:MM.SSSZ cpu91:2098572)NMP: nmp_ThrottleLogForDevice:3875: H:0x8 D:0x0 P:0x0 . Act:EVAL. cmdId.initiator=0x430e9e635c40 CmdSN 0x1fd
    YYYY-MM-DDTHH:MM.SSSZ cpu91:2098572)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device "naa.xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" state in doubt; requested fast path state update...
    YYYY-MM-DDTHH:MM.SSSZ cpu91:2098572)ScsiDeviceIO: 4115: Cmd(0x45ea6f60a108) 0x2a, CmdSN 0x1fd from world 2567217 to dev "naa.xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" failed H:0x8 D:0x0 P:0x0

  • Review the ESXTOP data, validate if there is a spike in DAVG values.


    From the above graph it is confirmed that the there is a spike in DAVG value. Refer Using esxtop to identify storage performance issues for ESXi (multiple versions)

    DAVG : This is the average response time in milliseconds per command being sent to the device.

    SCSI sense codes:
    H:0x8 - This status is returned when the HBA driver has aborted the I/O. It can also occur if the HBA does a reset of the target.
    0x2a - Write

Resolution

Please involve storage vendor to further investigate on the sudden spike in DAVG values and the write aborts.