ESXi 7.0 host experiences a PSOD when SRIOV and RoCE functions are both enabled in the inbox qedentv driver
book
Article ID: 324280
calendar_today
Updated On:
Products
VMware vSphere ESXi
Show More
Show Less
Issue/Introduction
Symptoms:
On ESXi 7.0, you experience these symptoms:
An ESXi host experiences a (PSOD) Purple Diagnostic Screen. The backtrace contains entries similar to: [1] 2020-02-20T06:14:03.542Z cpu39:1000368420)@BlueScreen: Failed at vmkdrivers/native/Proprietary/Network/qedentv/ecore/ecore_dev.c:4308 -- VMK_ASSERT(!(1)) 2020-02-20T06:14:03.558Z cpu39:1000368420)Code start: 0x420025000000 VMK uptime: 0:03:23:14.344 2020-02-20T06:14:03.575Z cpu39:1000368420)0x451a59c9ad60:[0x420025162cab]PanicvPanicInt@vmkernel#nover+0x2b3 stack: 0x420025162cab 2020-02-20T06:14:03.598Z cpu39:1000368420)0x451a59c9ae10:[0x4200251635a2]Panic_vPanic@vmkernel#nover+0x23 stack: 0x4309d78020ed 2020-02-20T06:14:03.621Z cpu39:1000368420)0x451a59c9ae30:[0x4200251862c0]vmk_PanicWithModuleID@vmkernel#nover+0x41 stack: 0x451a59c9ae90 2020-02-20T06:14:03.646Z cpu39:1000368420)0x451a59c9ae90:[0x42002648fda3]ecore_pglueb_set_pfid_enable@(qedentv)#<None>+0xd8 stack: 0x4309d7686348 2020-02-20T06:14:03.673Z cpu39:1000368420)0x451a59c9aec0:[0x4200264a69b6]ecore_recovery_prolog@(qedentv)#<None>+0x2b stack: 0x4309d77f39a0 2020-02-20T06:14:03.698Z cpu39:1000368420)0x451a59c9aee0:[0x420026475f45]qedentv_recovery_handler@(qedentv)#<None>+0x11e stack: 0x4309d77de5c0 2020-02-20T06:14:03.723Z cpu39:1000368420)0x451a59c9af00:[0x42002512fa1c]HelperQueueFunc@vmkernel#nover+0x7d9 stack: 0x0 2020-02-20T06:14:03.744Z cpu39:1000368420)0x451a59c9afd0:[0x420025500110]CpuSched_StartWorld@vmkernel#nover+0xf9 stack: 0x0 2020-02-20T06:14:03.765Z cpu39:1000368420)0x451a59c9b000:[0x420025115007]Debug_IsInitialized@vmkernel#nover+0x18 stack: 0x0 [2] 2020-02-18T00:51:34.588Z cpu38:1001648108)@BlueScreen: Failed at vmkdrivers/native/Proprietary/Network/qedentv/ecore/ecore_spq.c:206 -- VMK_ASSERT(!(1)) 2020-02-18T00:51:34.617Z cpu38:1001648108)Code start: 0x420019000000 VMK uptime: 0:22:58:07.790 2020-02-18T00:51:34.654Z cpu38:1001648108)0x451a6851aab0:[0x420019162cab]PanicvPanicInt@vmkernel#nover+0x2b3 stack: 0x420019162cab 2020-02-18T00:51:34.697Z cpu38:1001648108)0x451a6851ab60:[0x4200191635a2]Panic_vPanic@vmkernel#nover+0x23 stack: 0x4501ac15ec48 2020-02-18T00:51:34.742Z cpu38:1001648108)0x451a6851ab80:[0x4200191862c0]vmk_PanicWithModuleID@vmkernel#nover+0x41 stack: 0x451a6851abe0 2020-02-18T00:51:34.788Z cpu38:1001648108)0x451a6851abe0:[0x42001a2a5022]ecore_spq_post@(qedentv)#<None>+0x627 stack: 0x6920646f726d6152 2020-02-18T00:51:34.834Z cpu38:1001648108)0x451a6851ad30:[0x42001a2cdae8]ecore_eth_txq_start_ramrod@(qedentv)#<None>+0xbd stack: 0x2 2020-02-18T00:51:34.882Z cpu38:1001648108)0x451a6851ad80:[0x42001a2e4319]ecore_iov_process_mbx_req@(qedentv)#<None>+0x1d7e stack: 0x4311201d3202 2020-02-18T00:51:34.931Z cpu38:1001648108)0x451a6851aeb0:[0x42001a2884c3]qedentv_handle_vf_msg@(qedentv)#<None>+0x1ec stack: 0x42001a2887c4 2020-02-18T00:51:34.978Z cpu38:1001648108)0x451a6851af20:[0x42001a2887de]qedentv_sriov_task@(qedentv)#<None>+0x1ff stack: 0x4200191c32c8 2020-02-18T00:51:35.023Z cpu38:1001648108)0x451a6851af70:[0x420019191048]vmkWorldFunc@vmkernel#nover+0x6d stack: 0x420019191044 2020-02-18T00:51:35.064Z cpu38:1001648108)0x451a6851afd0:[0x420019500110]CpuSched_StartWorld@vmkernel#nover+0xf9 stack: 0x0 2020-02-18T00:51:35.105Z cpu38:1001648108)0x451a6851b000:[0x420019115007]Debug_IsInitialized@vmkernel#nover+0x18 stack: 0x0 [3] 2020-02-19T06:38:04.710Z cpu28:1001439035)@BlueScreen: Failed at vmkdrivers/native/Proprietary/Network/qedentv/ecore/ecore_int.c:439 -- VMK_ASSERT(!(1)) 2020-02-19T06:38:04.738Z cpu28:1001439035)Code start: 0x420036400000 VMK uptime: 0:02:03:23.802 2020-02-19T06:38:04.776Z cpu28:1001439035)0x451a77a99e50:[0x420036562cab]PanicvPanicInt@vmkernel#nover+0x2b3 stack: 0x420036562cab 2020-02-19T06:38:04.819Z cpu28:1001439035)0x451a77a99f00:[0x4200365635a2]Panic_vPanic@vmkernel#nover+0x23 stack: 0x43120f6d4c6d 2020-02-19T06:38:04.863Z cpu28:1001439035)0x451a77a99f20:[0x4200365862c0]vmk_PanicWithModuleID@vmkernel#nover+0x41 stack: 0x451a77a99f80 2020-02-19T06:38:04.910Z cpu28:1001439035)0x451a77a99f80:[0x4200376b9539]ecore_fw_assertion@(qedentv)#<None>+0xbe stack: 0x451a77a9a090 2020-02-19T06:38:04.958Z cpu28:1001439035)0x451a77a9a090:[0x4200376ba73b]ecore_int_deassertion@(qedentv)#<None>+0x454 stack: 0x5a30333600000001 2020-02-19T06:38:05.003Z cpu28:1001439035)0x451a77a9a280:[0x4200376bb1fb]ecore_int_sp_dpc@(qedentv)#<None>+0x4e0 stack: 0x1 2020-02-19T06:38:05.043Z cpu28:1001439035)0x451a77a9a2f0:[0x420036534e21]IntrCookieBH@vmkernel#nover+0x336 stack: 0x3e8 2020-02-19T06:38:05.080Z cpu28:1001439035)0x451a77a9a3a0:[0x420036507a98]BH_Check@vmkernel#nover+0x349 stack: 0x7 2020-02-19T06:38:05.123Z cpu28:1001439035)0x451a77a9a440:[0x420036a4055a]UserMem_HandleMapFault@vmkernel#nover+0x1d03 stack: 0x431baea02010 2020-02-19T06:38:05.170Z cpu28:1001439035)0x451a77a9ae60:[0x420036b055f7]User_ArchExceptionHandleFault@vmkernel#nover+0x1b0 stack: 0x0 2020-02-19T06:38:05.213Z cpu28:1001439035)0x451a77a9aec0:[0x420036a190e6]User_Exception@vmkernel#nover+0x183 stack: 0x3c0000000 2020-02-19T06:38:05.252Z cpu28:1001439035)0x451a77a9af20:[0x4200365d99e5]Int14_PF@vmkernel#nover+0x406 stack: 0x0 2020-02-19T06:38:05.290Z cpu28:1001439035)0x451a77a9af40:[0x4200365d1076]gate_entry@vmkernel#nover+0x77 stack: 0x80010031 Note : The preceding log excerpts are only examples. Date, time, and environmental variables may vary depending on your environment.
Environment
VMware vSphere ESXi 7.0.0
Cause
This issue occurs as sometimes, Hardware Parity attentions (i.e. TCM, XCM) are hit with SRIOV+RoCE configuration which leads to engine reset recovery. In this driver version, during engine reset recovery with VFs, the driver can go to bad state and cause a PSOD as VF driver is not aware of the engine reset.
Resolution
To resolve this issue, ensure the RoCE function is disabled when SRIOV function is in use.Note : RoCE function is disabled in qedentv by default. VMware recommends to not set "enable_roce=1 " in the qedentv module parameter when SRIOV function is in use.
Feedback
thumb_up
Yes
thumb_down
No