VM hangs on the host. No actions can be performed such as vMotion, power off etc. FRAME DROP event has been observed in vmkernel logs
search cancel

VM hangs on the host. No actions can be performed such as vMotion, power off etc. FRAME DROP event has been observed in vmkernel logs

book

Article ID: 371539

calendar_today

Updated On:

Products

VMware vSphere ESXi VMware vSphere ESXi 8.0

Issue/Introduction

  • You experience datastore latency.
  • VMs are randomly impacted with performance degradations or outages.
  • The ESXi host may go into a "Not Responding" state.
  • You are unable to perform vMotion and sVmotion of VMs.
  • You observe Failed SCSI commands, latency, and/or "state in doubt" messages similar to these in the vmkernel.log on the ESXi host:


[from /var/log/vmkernel.log on ESXi host]


 [YYYY-MM-DDTHH:MM:SS] cpu6:2098268)ScsiDeviceIO: 4124: Cmd(0x45b9816c0e48) 0x28, CmdSN 0x22ac from world 2100337 to dev "naa.624#############ad2" failed H:0x7 D:0x28 P:0x0

[YYYY-MM-DDTHH:MM:SS] cpu0:2098288)WARNING: ScsiDeviceIO: 1513: Device naa.624a93###########0001 performance has deteriorated. I/O latency increased from average value of 914 microseconds to 18407 microseconds.

[YYYY-MM-DDTHH:MM:SS] cpu52:2099716)WARNING: NMP: nmp_DeviceRequestFastDeviceProbe:237: NMP device "naa.6000###############2d" state in doubt; requested fast path state update...
[YYYY-MM-DDTHH:MM:SS] cpu52:2099716)ScsiDeviceIO: 4115: Cmd(0x45baf4f44648) 0x28, CmdSN 0xbd from world 11753952 to dev "naa.6000###############2d" failed H:0x2 D:0x0 P:0x0

  • You may also see "flaky path" messages in the vmkernel.log on the ESXi host:

[from /var/log/vmkernel.log on the ESXi host] 
 
2025-11-04T06:49:33.664Z Wa(180) vmkwarning: cpu35:2098423)WARNING: NMP: nmpHandleLinkEvent:3998: Marking path vmhba1:C0:T1:L13 flaky on link event 2 with timeoutMS = 20000 flakyMarkTC = 47954528501286772, reEvalFlakyPathTime = 20000

  • There may be additional messaging, the varies by the specific HBA model in use on the host: 
    • For Qlogic-based (qlnativefc driver) or Emulex-based (lpfc driver) adapters, you may see DROPPED FRAME messages similar to these in the vmkernel.log on the ESXi host:


      [from /var/log/vmkernel.log on ESXi host]

       [YYYY-MM-DDTHH:MM:SS] In (182) vmkernel: cpu54:2098257) qlnativefc: vmhba2 (12:0.0): qlnativefcStatusEntry:1919: (7:41) Dropped frame (s) detected (106496 of 131072 bytes). 
       [YYYY-MM-DDTHH:MM:SS] In (182) vmkernel: cpu54:2098257) qlnativefc: vmhba2 (12:0.0): qlnativefcStatusEntry: 2067:C0:T7:L41 FCP command status: 0x15-0x0 (0x2) portid=bc0142 oxid=0x4ba c 80000 len-131072 rspInfo=0x0   resid=0x0 fwResid=0x1a000 host status = 0x2 device  

      [YYYY-MM-DDTHH:MM:SS] In (182) vmkernel: cpu54:2098257) qlnativefc: vmhba2 (12:0.0): qlnativefcStatusEntry:1919: (7:41) Dropped frame (s) detected (116736 of 131072 bytes).


    • For Cisco VIC CNA adapters (nfnic driver), you may see these FCPIO_DATA_CNT_MISMATCH messages similar to these in the vmkernel.log on the ESXi host:


      [from /var/log/vmkernel.log on ESXi host]

[YYYY-MM-DDTHH:MM:SS] Wa(180) vmkernel: cpu35:2098423)nfnic: <1>: fnic_fcpio_icmnd_cmpl_handler: : sc: 0x##############3 tag: 0x## hdr status: FCPIO_DATA_CNT_MISMATCH IO failure! Refer KB340039

 

and/or these "Dropping frame" messages: 


[from /var/log/vmkernel.log on ESXi host]

[YYYY-MM-DDTHH:MM:SS] Wa(180) vmkernel: cpu35:2098423)nfnic <1>: INFO: fnic_fdis_recv_frame: 4506: Received unknown FCoE frame of len: 52. Dropping Frame
[YYYY-MM-DDTHH:MM:SS] Wa(180) vmkernel: cpu35:2098423)nfnic <1>: INFO: fnic_fdis_validate_and_get_frame_type: 4166: Received FPIN with some invalid frame bits S_ID: 0xfffffd FCTL: 0x38 R_CTRL: 0x22 type: 0x1. Dropping frame.
 
  • On Windows Server VMs the following event might be reported in the Windows event logs:

Windows Event ID 129 ("Reset to device, \Device\RaidPortX, was issued")

 

Environment

  • VMware vSphere ESXi 7.x
  • VMware vSphere ESXi 8.x

Cause

  • Dropped frames, error frames, Link Failure count, Loss of Signal, Invalid CRC and Invalid TX Word Count all indicate physical layer connectivity issues on the fiber channel fabric (fiber optic cable, SFP, or switch).

  • You can confirm the specific errors by using the "esxcli storage san fc stats get" command from an ssh session to the ESXi host:
[root@esxhostname:~] esxcli storage san fc stats get
   Adapter: vmhba3
   Tx Frames: 1289812770
   Rx Frames: 1041269154
   Lip Count: 0
   Error Frames: 275
   Dumped Frames: 1
   Link Failure Count: 465
   Loss of Signal Count: 0
   PrimSeq Protocol Err Count: 1
   Invalid Tx Word Count: 4294967295
   Invalid CRC Count: 13
   Input Requests: 72557000
   Output Requests: 75218036
   Control Requests: 80570

 NOTE:

    • Invalid Tx Word Count is a physical layer error. It means the light signal traveling over the fiber cable between the HBA and the fabric switch is being interrupted, and the HBA cannot decode the signal correctly.
    • Loss of Signal Count indicates that the light actually went out. 
    • Link Failure Count is a hardware-level counter that tracks the number of times the Fibre Channel (FC) link has completely lost connectivity and had to re-initialize.
    • Invalid CRC indicates that the calculated checksum of the data in the frame does not match the stored value, thus the frame data been corrupted in transit.
  • You may also verify the dropped frame events with the esxcli storage san fc events get CLI command: 
[root@esxhostname:~] esxcli storage san fc events get 

2025-12-17 17:15:00.374 [vmhba3] Dropped frames (10240 of 10 bytes) on C0:T1:L3 cmd:0x28
2025-12-17 17:15:00.388 [vmhba3] Dropped frames (512 of 10 bytes) on C0:T1:L3 cmd:0x28

Resolution

  • Please engage with the your storage, fabric and hardware team(s) and vendors to investigate and mitigate the physical layer issues the environment.