PSOD (purple screen of death) Can Occur When Using QFLE3 Driver below version 1.1.13 or 1.4.13
search cancel

PSOD (purple screen of death) Can Occur When Using QFLE3 Driver below version 1.1.13 or 1.4.13

book

Article ID: 318010

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

A host will crash with a purple screen with one of the below backtraces:

  • The  PSOD will have backtraces similar to the following:
2020-05-07T08:30:26.104Z cpu59:5261210)WARNING: Heartbeat: 760: PCPU 44 didn't have a heartbeat for 21 seconds; *may* be locked up.^[[0m
2020-05-07T08:30:26.104Z cpu44:2097436)ALERT: NMI: 696: NMI IPI: RIPOFF(base):RBP:CS [0xc7490(0x418001000000):0x4302ee371a80:0xfc8] (Src 0x1, CPU44)^[[0m
2020-05-07T08:30:26.104Z cpu44:2097436)0x451b88e1b788:[0x4180010c748f]SafeMemAccess_CmpXchg4ExceptionPossible@vmkernel#nover+0xe stack: 0x4302ee371d40
2020-05-07T08:30:26.104Z cpu44:2097436)0x451b88e1b790:[0x418001157a50]FastSlabCreateObj@vmkernel#nover+0x88 stack: 0x100000001
2020-05-07T08:30:26.104Z cpu44:2097436)0x451b88e1b810:[0x418001158013]FastSlabReplenishCPU@vmkernel#nover+0x6e stack: 0x41804b005fd0
2020-05-07T08:30:26.104Z cpu44:2097436)0x451b88e1b850:[0x418001156425]FastSlabAllocSlow@vmkernel#nover+0x7e stack: 0x0
2020-05-07T08:30:26.104Z cpu44:2097436)0x451b88e1b870:[0x4180011564de]FastSlab_AllocWithTimeout@vmkernel#nover+0x83 stack: 0x451b88e1b9b8
2020-05-07T08:30:26.104Z cpu44:2097436)0x451b88e1b8c0:[0x41800103c959]vmk_PageSlabAlloc@vmkernel#nover+0x22 stack: 0x451b00000800
2020-05-07T08:30:26.104Z cpu44:2097436)0x451b88e1b8d0:[0x4180011d30aa]PktPageAlloc_AllocPages@vmkernel#nover+0x37 stack: 0x451b88e1b950
2020-05-07T08:30:26.104Z cpu44:2097436)0x451b88e1b950:[0x41800125f56b]vmk_PktAllocPage@vmkernel#nover+0x10 stack: 0x4310177ed010
2020-05-07T08:30:26.104Z cpu44:2097436)0x451b88e1b960:[0x418001b3b295]qfle3_page_alloc_and_map@(qfle3)#<None>+0x22 stack: 0xeb4bbd3
2020-05-07T08:30:26.104Z cpu44:2097436)0x451b88e1b9b0:[0x418001b51235]qfle3_alloc_rx_sge_mbuf@(qfle3)#<None>+0x2e stack: 0x3f
2020-05-07T08:30:26.104Z cpu44:2097436)0x451b88e1b9f0:[0x418001b517d4]qfle3_alloc_fp_buffers@(qfle3)#<None>+0x2f5 stack: 0x2d35302d30323032
2020-05-07T08:30:26.104Z cpu44:2097436)0x451b88e1ba60:[0x418001b3d000]qfle3_rq_create@(qfle3)#<None>+0x3a9 stack: 0x0
2020-05-07T08:30:26.104Z cpu44:2097436)0x451b88e1bae0:[0x418001af4a77]qfle3_cmd_create_q@(qfle3)#<None>+0x15c stack: 0x0
2020-05-07T08:30:26.104Z cpu44:2097436)0x451b88e1bb30:[0x418001b2b65e]qfle3_sm_q_cmd@(qfle3)#<None>+0x147 stack: 0x10
2020-05-07T08:30:26.104Z cpu44:2097436)0x451b88e1bbb0:[0x418001b3c9c2]qfle3_rq_alloc@(qfle3)#<None>+0x2d7 stack: 0x4307036b2780
2020-05-07T08:30:26.104Z cpu44:2097436)0x451b88e1bc40:[0x4180012dd8bd]UplinkNetq_AllocHwQueueWithAttr@vmkernel#nover+0x92 stack: 0x17
2020-05-07T08:30:26.104Z cpu44:2097436)0x451b88e1bc90:[0x418001217435]NetqueueBalActivatePendingRxQueues@vmkernel#nover+0x156 stack: 0x79e28088
2020-05-07T08:30:26.104Z cpu44:2097436)0x451b88e1bd50:[0x418001218075]NetqueueBalRxQueueCommitChanges@vmkernel#nover+0x36 stack: 0x0
2020-05-07T08:30:26.104Z cpu44:2097436)0x451b88e1bd90:[0x41800121b677]UplinkNetqueueBal_BalanceCB@vmkernel#nover+0x19fc stack: 0x430779e7f1d0
2020-05-07T08:30:26.104Z cpu44:2097436)0x451b88e1bf00:[0x4180012d8309]UplinkAsyncProcessCallsHelperCB@vmkernel#nover+0x116 stack: 0x43090803f7b0
2020-05-07T08:30:26.104Z cpu44:2097436)0x451b88e1bf30:[0x4180010eaf7a]HelperQueueFunc@vmkernel#nover+0x157 stack: 0x43090803f0b8
2020-05-07T08:30:26.104Z cpu44:2097436)0x451b88e1bfe0:[0x41800130f9f2]CpuSched_StartWorld@vmkernel#nover+0x77 stack: 0x0
2020-05-07T08:30:30.214Z cpu11:2626161)VMotion: 5367: 4078885979155334064 S: Another pre-copy iteration needed with 377085 pages left to send (prev2 8388608, prev 8388608, pages dirtied by pass through device 0, network bandwidth ~1028.528 MB/s, 5663% t$
  • In additional to the similar PSOD above, The following messages from the qfle3 driver might be present in the /var/run/log/vmkernel.log:
[7m2020-05-07T05:19:56.921Z cpu58:2097436)WARNING: qfle3: ecore_state_wait:315: timeout waiting for state 10
[7m2020-05-07T05:19:56.921Z cpu58:2097436)WARNING: qfle3: qfle3_remove_queue_filter:2370: [vmnic5] RX 3 queue state not changed for fid: 0
[7m2020-05-07T05:19:56.922Z cpu58:2097436)WARNING: qfle3: ecore_queue_chk_transition:5969: Blocking transition since pending was 400
[7m2020-05-07T05:19:56.922Z cpu58:2097436)WARNING: qfle3: ecore_queue_state_change:4855: check transition returned an error. rc -2


or

cpu5:2097316)0x451a4521b9e8:[0x41802f52538d]vmk_Memset@vmkernel#nover+0x9
cpu5:2097316)0x451a4521b9f0:[0x41802fe42262]qfle3_alloc_fp_buffers@(qfle3)#<None>+0x7f
cpu5:2097316)0x451a4521ba60:[0x41802fe2db9c]qfle3_rq_create@(qfle3)#<None>+0x3a9
cpu5:2097316)0x451a4521bae0:[0x41802fde4d37]qfle3_cmd_create_q@(qfle3)#<None>+0x15c
cpu5:2097316)0x451a4521bb30:[0x41802fe1c6c2]qfle3_sm_q_cmd@(qfle3)#<None>+0x147
cpu5:2097316)0x451a4521bbb0:[0x41802fe2d55e]qfle3_rq_alloc@(qfle3)#<None>+0x2d7
cpu5:2097316)0x451a4521bc40:[0x41802f6de61d]UplinkNetq_AllocHwQueueWithAttr@vmkernel#nover+0x92
cpu5:2097316)0x451a4521bc90:[0x41802f617ee5]NetqueueBalActivatePendingRxQueues@vmkernel#nover+0x156
cpu5:2097316)0x451a4521bd50:[0x41802f618b25]NetqueueBalRxQueueCommitChanges@vmkernel#nover+0x36
cpu5:2097316)0x451a4521bd90:[0x41802f61c127]UplinkNetqueueBal_BalanceCB@vmkernel#nover+0x19fc
cpu5:2097316)0x451a4521bf00:[0x41802f6d9069]UplinkAsyncProcessCallsHelperCB@vmkernel#nover+0x116
cpu5:2097316)0x451a4521bf30:[0x41802f4eb06a]HelperQueueFunc@vmkernel#nover+0x157
cpu5:2097316)0x451a4521bfe0:[0x41802f7107da]CpuSched_StartWorld@vmkernel#nover+0x77

  • In the ESXi var/log/vmkernel.log you will see entries as below

2020-10-21T12:06:29.190Z cpu12:2097316)WARNING: qfle3: qfle3_rq_create:376: [vmnic1] RQ seems to have already been created [7m[0m
2020-10-21T12:06:34.190Z cpu2:2097316)qfle3: qfle3_queue_alloc_with_attr:642: [vmnic1] Feature RSS requested.

2020-10-21T12:06:34.190Z cpu2:2097316)qfle3: qfle3_rq_alloc:327: [vmnic1] Rxq 2 is leading RSS with 4 RSS queues.

2020-10-21T12:06:34.190Z cpu2:2097316)WARNING: qfle3: qfle3_rq_create:376: [vmnic1] RQ seems to have already been created [0m
2020-10-21T12:06:39.190Z cpu10:2097316)qfle3: qfle3_queue_alloc_with_attr:642: [vmnic1] Feature RSS requested.

2020-10-21T12:06:39.190Z cpu10:2097316)qfle3: qfle3_rq_alloc:327: [vmnic1] Rxq 2 is leading RSS with 4 RSS queues.

2020-10-21T12:06:39.190Z cpu10:2097316)WARNING: qfle3: qfle3_rq_create:376: [vmnic1] RQ seems to have already been created [0m
2020-10-21T12:06:44.190Z cpu18:2097316)qfle3: qfle3_queue_alloc_with_attr:642: [vmnic1] Feature RSS requested.

2020-10-21T12:06:44.190Z cpu18:2097316)qfle3: qfle3_rq_alloc:327: [vmnic1] Rxq 2 is leading RSS with 4 RSS queues.

2020-10-21T12:06:44.190Z cpu18:2097316)WARNING: qfle3: qfle3_rq_create:376: [vmnic1] RQ seems to have already been created [0m
2020-10-21T12:06:49.191Z cpu8:2097316)qfle3: qfle3_queue_alloc_with_attr:642: [vmnic1] Feature RSS requested.

2020-10-21T12:06:49.191Z cpu8:2097316)qfle3: qfle3_rq_alloc:327: [vmnic1] Rxq 2 is leading RSS with 4 RSS queues.

2020-10-21T12:06:49.191Z cpu8:2097316)WARNING: qfle3: qfle3_rq_create:376: [vmnic1] RQ seems to have already been created [0m
2020-10-21T12:06:54.191Z cpu12:2097316)qfle3: qfle3_queue_alloc_with_attr:642: [vmnic1] Feature RSS requested.

Environment

 ESX 6.7
 ESX 7.0

Resolution

Qlogic has released a new driver for ESXi 6.7 and 7.0 to address these issues:

  • ESXi 6.7: Version 1.1.13.0
  • ESXi 7.0: Version 1.4.13.0