ESXi PSOD due to #PF Exception 14 in world 2098030:qfle3_stats_ IP
search cancel

ESXi PSOD due to #PF Exception 14 in world 2098030:qfle3_stats_ IP

book

Article ID: 408106

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

PSOD happened due to race condition running within the qfle3 driver threads.

In the PSOD back trace, you see stack entries similar to

YYYY-MM-DDTHH:MM:SS cpu22:2098030)@BlueScreen: #PF Exception 14 in world 2098030:qfle3_stats_ IP 0x42000a6cb1d4 addr 0x4520a30d4cc4
PTEs:0x100074023;0x30800e9063;0x16955b063;0x0;
YYYY-MM-DDTHH:MM:SS cpu22:2098030)Code start: 0x420009800000 VMK uptime: 68:10:18:29.394
YYYY-MM-DDTHH:MM:SS cpu22:2098030)0x4538d619be30:[0x42000a6cb1d4]qfle3_stats_update@(qfle3)#<None>+0x30 stack: 0x0
YYYY-MM-DDTHH:MM:SS cpu22:2098030)0x4538d619bef0:[0x42000a6cc989]qfle3_stats_handle@(qfle3)#<None>+0x42 stack: 0x2c0f15b39ec420
YYYY-MM-DDTHH:MM:SS cpu22:2098030)0x4538d619bf10:[0x42000a65894e]qfle3_stats_update_func@(qfle3)#<None>+0xdb stack: 0x4538d619f000
YYYY-MM-DDTHH:MM:SS cpu22:2098030)0x4538d619bf40:[0x4200098da3ff]HelperQueueFunc@vmkernel#nover+0x2d8 stack: 0x4538d61a0b48
YYYY-MM-DDTHH:MM:SS cpu22:2098030)0x4538d619bfe0:[0x420009bb4d55]CpuSched_StartWorld@vmkernel#nover+0x86 stack: 0x0
YYYY-MM-DDTHH:MM:SS cpu22:2098030)0x4538d619c000:[0x4200098c4ddf]Debug_IsInitialized@vmkernel#nover+0xc stack: 0x0
YYYY-MM-DDTHH:MM:SS cpu22:2098030)base fs=0x0 gs=0x420045800000 Kgs=0x0
YYYY-MM-DDTHH:MM:SS cpu22:2098030)CPU model name: Intel(R) Xeon(R) CPU E5-2683 v4 @ 2.10GHz, FMS: 06/4f/1, uCodeRev: b000040

From /var/core/vmkernel-zdump.1

YYYY-MM-DDTHH:MM:SS cpu19:2097987)WARNING: qfle3: qfle3_write_dmae:4358: [vmnicX] DMAE returned failure -1
YYYY-MM-DDTHH:MM:SS cpu19:2097987)WARNING: qfle3: qfle3_issue_dmae_with_comp:4207: [vmnicX] DMAE timeout: comp 0!
YYYY-MM-DDTHH:MM:SS cpu19:2097987)WARNING: qfle3: qfle3_write_dmae:4358: [vmnicX] DMAE returned failure -1
YYYY-MM-DDTHH:MM:SS cpu31:2098061)Deactive_dev Entering
YYYY-MM-DDTHH:MM:SS cpu31:2098061)marking link down
YYYY-MM-DDTHH:MM:SS cpu16:2098033)WARNING: qfle3: qfle3_parity_attn:16339: [vmnicX] Parity errors detected in blocks:
YYYY-MM-DDTHH:MM:SS cpu16:2098033)WARNING: qfle3: _print_next_block:15970: USEMI
YYYY-MM-DDTHH:MM:SS cpu16:2098033)WARNING: qfle3: _print_parity:15964:  [0x00000000]
YYYY-MM-DDTHH:MM:SS cpu16:2098033)WARNING: qfle3: _print_parity:15964:  [0x00000004]
YYYY-MM-DDTHH:MM:SS cpu16:2098033)qfle3: qfle3_parity_attn:16362: [vmnicX]

From  ESXI /var/log/vobd.log showing vmnic down errors

YYYY-MM-DDTHH:MM:SS: [netCorrelator] 5912225769206us: [vob.net.dvport.uplink.transition.down] Uplink: vmnicX is down. Affected dvPort: XXX/50 2f 82 25 14 bd d3 a3-0f ef 72 81 7b 26 fc 61. 1 uplinks up. Failed criteria: 128
YYYY-MM-DDTHH:MM:SS: [netCorrelator] 5912225769219us: [vob.net.dvport.uplink.transition.down] Uplink: vmnicX is down. Affected dvPort: XXX/50 2f 82 25 14 bd d3 a3-0f ef 72 81 7b 26 fc 61. 1 uplinks up. Failed criteria: 128

Environment

VMware vSphere ESXi 7.0

Cause

Based on the memory dump analysis, the Purple Screen of Death (PSOD) was triggered when a thread attempted to access adapter->sp->stats_comp, which had already been deallocated and unmapped. This invalid memory access resulted in the crash. The underlying cause appears to be a race condition within the qfle3 driver threads.

A bad state or outdated firmware when we see the Parity errors for vmnics

The DMAE errors, parity errors were detected on vmnic, that implies there were something wrong with the devices hardware.

Resolution

  • PSOD happened due to race condition running within the qfle3 driver threads and bad vmnic.
  • Please engage your hardware vendor for further investigation. 
  • Note: Failed Criteria 128 is driver reporting a link state down. This can be caused by unplugging the network cable or administratively downing the physical switchport. If this was not an intended link outage it will likely be an issue with the driver, firmware, SFP+ module, cable, and/or switchport of the physical switch.  Please refer KB: https://knowledge.broadcom.com/external/article?legacyId=2014553

Additional Information

To determine the version information for a physical network interface card in vSphere ESXi: https://knowledge.broadcom.com/external/article/323110/determining-networkstorage-firmware-and.html

Ensure that your hardware is listed in the VMware Compatibility Guides