High latency observed to FC channel datastores due to dropped frames.
search cancel

High latency observed to FC channel datastores due to dropped frames.

book

Article ID: 434602

calendar_today

Updated On:

Products

VMware vSphere ESXi VMware vSphere ESXi 8.0

Issue/Introduction

  • Multiple Virtual Machines (VMs) in a cluster or on a single host report disk I/O errors and high latency.
  • High I/O wait times and performance degradation are observed due to dropped frames.
  • The issue may be isolated to a single ESXi host or specific HBA paths while other hosts in the same cluster remain unaffected.

 

Environment

VMware vSphere ESXi 8
VMware vSphere ESX 9

Cause

Dropped frames cause high VM latency.

Frame drops can lead to large levels of disruption in a Fibre Channel environment as we can see that commands are dropped, aborted and have to be re-issued. 

Dropped frames are a critical issue in Fibre Channel fabrics, and they are almost always caused by a hardware problem anywhere along the connection path.

Below are some of the common reasons why an environment may encounter dropped frames

  • Faulty or Damaged Fibre Optic Cables
  • Dirty or Poorly Seated Connectors
  • Faulty HBA
  • Transceiver Malfunction (SFP/SFP+)
  • Faulty Switch Port on the switch or on the storage array controller

 

In VMware ESXi environments, dropped frames on the storage fabric are recorded in the host's /var/log/vmkernel.log file.

Because different hardware utilizes different drivers, the exact format of these warning messages will vary depending on the storage driver in use.

Below are examples of how dropped frame events are reported across common generic and vendor-specific drivers.

I/O Device Management (IODM)
The generic IODM layer will log a warning when it observes multiple frame drop events within a short timeframe.
####-##-##T##:##:##.###Z Wa(180) vmkwarning: cpu21:2098060)WARNING: iodm: vmk_IodmEvent:191: vmhba#: FRAME DROP event has been observed 20 times in the last one minute. This suggests a problem with Fibre Channel link/switch!.

QLogic Native Fibre Channel Driver (qlnativefc)
####-##-##T##:##:##.###Z In(182) vmkernel: cpu128:2098242)qlnativefc: vmhba#(28:0.0): qlnativefcStatusEntry:1927:(#:#) Dropped frame(s) detected (266240 of 262144 bytes).

####-##-##T##:##:##.###Z In(182) vmkernel: cpu128:2098242)qlnativefc: vmhba#(28:0.0): qlnativefcStatusEntry:2076:C0:T#:L# - FCP command status: 0x15-0x202 (0x7) portid=###### oxid=###### cdb=2a0075 len=262144 rspInfo=0x0 resid=0x0 fwResid=0x41000 host status = 0x7 device $
####-##-##T##:##:##.###Z In(182) vmkernel: cpu128:2098242)qlnativefc: vmhba#(28:0.0): qlnativefcStatusEntry:1927:(#:#) Dropped frame(s) detected (266240 of 262144 bytes).

Emulex LightPulse Fibre Channel Driver (lpfc)
####-##-##T##:##:##.###Z In(182) vmkernel: cpu3:2097907)lpfc: lpfc_rportStats:4871: 1:(0) Compression log for fcp target 0, path is ok, FRAME: drops=345, under=0, over=0
####-##-##T##:##:##.###Z In(182) vmkernel: cpu5:2097907)lpfc: lpfc_rportStats:4871: 1:(0) Compression log for fcp target 0, path is ok, FRAME: drops=349, under=0, over=0
####-##-##T##:##:##.###Z In(182) vmkernel: cpu26:2097907)lpfc: lpfc_rportStats:4871: 1:(0) Compression log for fcp target 0, path is ok, FRAME: drops=350, under=0, over=0

 

localcli storage san fc events get
FC Event Log
------------
####-##-## ##:##:##.### [vmhba1] LINK UP
####-##-## ##:##:##.### [vmhba2] Dropped frames (57344 of 805 bytes) on C0:T0:L9 cmd:0x28
####-##-## ##:##:##.### [vmhba2] Dropped frames (106496 of 805 bytes) on C0:T1:L9 cmd:0x28
####-##-## ##:##:##.### [vmhba2] Dropped frames (86016 of 805 bytes) on C0:T0:L9 cmd:0x28
####-##-## ##:##:##.### [vmhba2] Dropped frames (102400 of 805 bytes) on C0:T1:L9 cmd:0x28
####-##-## ##:##:##.### [vmhba2] Dropped frames (96256 of 805 bytes) on C0:T0:L9 cmd:0x28

####-##-## ##:##:##.### [vmhba2] Dropped frames (35392 of 805 bytes) on C0:T0:L3 cmd:0x28
####-##-## ##:##:##.### [vmhba2] Dropped frames (91648 of 805 bytes) on C0:T1:L3 cmd:0x28
####-##-## ##:##:##.### [vmhba2] Dropped frames (29440 of 805 bytes) on C0:T0:L3 cmd:0x28
####-##-## ##:##:##.### [vmhba2] Dropped frames (19008 of 805 bytes) on C0:T1:L3 cmd:0x28

 

# To isolate which adapter is reporting dropped frames
grep Dropped /var/log/vmkernel.log | awk '{print $3}' sort uniq -c 

        352 vmhba# (##:#.#):

 

Once the environment has been stabilized, thoroughly review the HBA, the fabric, and the backend storage array to identify the source of the dropped frames.

After the hardware issue is remedied, if actioned reconfigure the connection to the datastore to ensure it has redundant paths, mitigating future single points of failure.

 

Resolution

 
 

Immediate Stabilization & Troubleshooting Steps:

  • Isolate the Host (If frames are isolated to a single host):
    • If a single host is experiencing dropped frames and there are sufficient compute and memory resources in the environment, place the host into maintenance mode. This will stabilize VM workloads while an investigation is carried out.

 

Disable Affected HBA Paths (If dropped frames are isolated to a single HBA):

If redundancy is available and the issue is isolated to a single Host Bus Adapter (HBA), you may disable the paths that utilize it.

  • To disable all paths for a specific HBA (e.g., vmhba#):

    localcli storage core path list | grep "Runtime Name:" | grep vmhba# | awk '{print $3}' | while read line; do localcli storage core path set --path $line --state off; done

  • To disable a single path manually:

    localcli storage core path set -p <vmhba#:C#:T#:L#> --state off


  • Investigate the Root Cause:

Once the environment has been stabilized, thoroughly review the HBA, the fabric, and the backend storage array to identify the source of the dropped frames.

After the hardware issue is remedied, reconfigure the connection to the datastore to ensure it has redundant paths, mitigating future single points of failure.