Remediation of the issue by means of updating drivers
Symptoms:
The ESXi host may go into a PSOD state with the back trace as below :
[YYYY-MM-DDTHH:MM:SS] cpu25:2098138)@BlueScreen: #PF Exception 14 in world 2098138:ql_fcoe_dela IP 0x42002b40159c addr 0x128PTEs:0x14f2fa023;0x14f2fb023;0x14f2fc023;0x0;[YYYY-MM-DDTHH:MM:SS] cpu25:2098138)Code start: 0x42002a400000 VMK uptime: 82:21:43:01.789[YYYY-MM-DDTHH:MM:SS] cpu25:2098138)0x4538d989bf28:[0x42002b40159c]CommandPumpOnPassiveLevel@(qedf)#<None>+0x0 stack: 0x43127a373000[YYYY-MM-DDTHH:MM:SS] cpu25:2098138)0x4538d989bf30:[0x42002b3e684a]SendFCoEVlanSolicitation@(qedf)#<None>+0x353 stack: 0x43127a373018[YYYY-MM-DDTHH:MM:SS] cpu25:2098138)0x4538d989bf50:[0x42002b3e7013]FipVlanTimeoutWork@(qedf)#<None>+0x15c stack: 0x43127a373018[YYYY-MM-DDTHH:MM:SS] cpu25:2098138)0x4538d989bf70:[0x42002b3ff711]ql_fcoe_do_singlethread_work@(qedf)#<None>+0x76 stack: 0x43127a373000[YYYY-MM-DDTHH:MM:SS] cpu25:2098138)0x4538d989bf90:[0x42002a51e224]vmkWorldFunc@vmkernel#nover+0x49 stack: 0x42002a51e220[YYYY-MM-DDTHH:MM:SS] cpu25:2098138)0x4538d989bfe0:[0x42002a7b3b09]CpuSched_StartWorld@vmkernel#nover+0x86 stack: 0x0[YYYY-MM-DDTHH:MM:SS] cpu25:2098138)0x4538d989c000:[0x42002a4c4d7f]Debug_IsInitialized@vmkernel#nover+0xc stack: 0x0
VMware vSphere ESXi 7.0
Checking with command "localcli storage core adapter list" may show devices that are using the qedf driver.
The vmkernel.log file may show entries similar to below lines:
[YYYY-MM-DDTHH:MM:SS] cpu26:2097871)qedf:vmhba0:qedfc_link_update_handler:1926:Info: ST(LINK): LINK_DOWN->LINK_UP
[YYYY-MM-DDTHH:MM:SS] cpu26:2097871)qedf:vmhba0:qedfc_link_update_handler:1897:Info: ST(LINK): LINK_UP->LINK_DOWN
[YYYY-MM-DDTHH:MM:SS] cpu26:2097871)qedf:vmhba0:qedfc_link_update_handler:1926:Info: ST(LINK): LINK_DOWN->LINK_UP
[YYYY-MM-DDTHH:MM:SS] cpu26:2097871)qedf:vmhba0:qedfc_link_update_handler:1897:Info: ST(LINK): LINK_UP->LINK_DOWN
[YYYY-MM-DDTHH:MM:SS] cpu26:2097871)qedf:vmhba0:qedfc_link_update_handler:1926:Info: ST(LINK): LINK_DOWN->LINK_UP
[YYYY-MM-DDTHH:MM:SS] cpu26:2097871)qedf:vmhba0:qedfc_link_update_handler:1897:Info: ST(LINK): LINK_UP->LINK_DOWN
[YYYY-MM-DDTHH:MM:SS] cpu26:2097871)qedf:vmhba0:qedfc_link_update_handler:1926:Info: ST(LINK): LINK_DOWN->LINK_UP
[YYYY-MM-DDTHH:MM:SS] cpu26:2097871)qedf:vmhba0:qedfc_link_update_handler:1897:Info: ST(LINK): LINK_UP->LINK_DOWN
[YYYY-MM-DDTHH:MM:SS] cpu26:2097871)qedf:vmhba0:qedfc_link_update_handler:1926:Info: ST(LINK): LINK_DOWN->LINK_UP
Update the qedf driver to driver version 2.74.1.0-1OEM
Please refer to the following knowledge base article to download the driver for the 45000/41000 Series Adapters:
Finding IO Drivers in the Broadcom Support Portal
Checking the release notes for this driver we see that the issue has been fixed :
QLogic qedf VMware ESX Native Driver for ESXi 7.0/8.0
Copyright (c) 2015-2019 Cavium Inc.
Copyright (c) 2019-2020 Marvell Semiconductor, Inc.
All rights reserved
Version: 2.74.1.0
===========================
Enhancements:
-------------
- Update to qed-8.74.0.0 with storm fw 8.72.1.0
Fixes:
------
* [FJT-9121] : PSOD due to race condition between SendFCoEVlanSolicitation and
LogoutAllFabrics.
Resolution : Add mechanism of sync between SendFCoEVlanSolicitation and
LogoutAllFabrics.
Scope : 45000/41000 Series Adapters
PSOD due to race condition between SendFCoEVlanSolicitation and LogoutAllFabrics.
Impact/Risks:
Host goes into a PSOD state