High VM latency and corrupted application data due to FC frame drops and high "Invalid Tx Word Count"
search cancel

High VM latency and corrupted application data due to FC frame drops and high "Invalid Tx Word Count"

book

Article ID: 413619

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

  • VMs running on Fiber channel datastores have poor performance (high latency) on one or more ESXi hosts.

  • Application data on the VMs may be corrupted. 

Environment

VMware vSphere ESXi (all versions). 

Cause

This may arise where there are Fibre Channel fabric issues.


Evidence of such issues may include:

  • Dropped frames, with repeated logging in /var/log/vmkernel.log similar to:

    qlnativefc: vmhba1(de:0.1): qlnativefcStatusEntry:1924:(3:8) Dropped frame(s) detected (37440 of 36864 bytes)

  • High "Invalid Tx Word Count" on one or more HBAs, e.g.:


# esxcli storage san fc stats get: FcStat:
   Adapter: vmhba0
   Tx Frames: 856383494
   Rx Frames: 1547852900
   ...
   Invalid Tx Word Count: 21804
   ...

  • Aborted I/O due to slow I/O / command timeouts, e.g.:

/var/log/vmkernel.log includes logging similar to: 
VSCSI: 3285: handle 9366752458711311(GID:8463)(vscsi1:2):processing reset for handle ... state 1381192707
qlnativefc: vmhba1(de:0.1): qlnativefcTaskMgmt:2325:Task Mgmt virt reset
qlnativefc: vmhba1(de:0.1): qlnativefcEhVirtualReset:3239:C0:T0:L5: VIRTUAL RESET ISSUED.
qlnativefc: vmhba1(de:0.1): qlnativefcEhVirtualReset:3265:Command aborted on target=0x32x, lun=0x05 - SCSI command timeout counter incremented to 6081

Resolution

Investigate at the fabric level, with fabric vendor support as required.