/var/run/log/logEFI.log on ESX hostYYYY-MM-DDTHH:MM:SS In(14) LogEFI[####]: #PF Exception 14 in world #####:lpfc_path_cl IP 0#### addr 0####YYYY-MM-DDTHH:MM:SS In(14) LogEFI[####]: PTEs:0####;0####;0x0;YYYY-MM-DDTHH:MM:SS In(14) LogEFI[####]: Module(s) involved in panic: [lpfc 900.14.4.390.20-36vmw.901.0.24957456 (External)]YYYY-MM-DDTHH:MM:SS In(14) LogEFI: cpu#:####)cr0=0#### cr2=0#### cr3=0#### cr4=0####YYYY-MM-DDTHH:MM:SS In(14) LogEFI: cpu#:####)FMS=#### uCode=0####YYYY-MM-DDTHH:MM:SS In(14) LogEFI: cpu#:####)frame=0#### ip=0#### err=0x0 rflags=0####YYYY-MM-DDTHH:MM:SS In(14) LogEFI: cpu#:####)rax=0x0 rbx=0#### rcx=0x0YYYY-MM-DDTHH:MM:SS In(14) LogEFI: cpu#:####)rdx=0#### rbp=0#### rsi=0####YYYY-MM-DDTHH:MM:SS In(14) LogEFI: cpu#:####)rdi=0xffffffffffffffff r8=0x0 r9=0x0YYYY-MM-DDTHH:MM:SS In(14) LogEFI: cpu#:####)r10=0x0 r11=0x0 r12=0####YYYY-MM-DDTHH:MM:SS In(14) LogEFI: cpu#:####)r13=0x1 r14=0#### r15=0####YYYY-MM-DDTHH:MM:SS In(14) LogEFI[#####]: *PCPU#:####/lpfc_path_claim-#-#YYYY-MM-DDTHH:MM:SS In(14) LogEFI[#####]: PCPU #: SVVSUVVUVVVUVVSVVVUVVVVVSVVVVSVVYYYY-MM-DDTHH:MM:SS In(14) LogEFI: cpu#:####)Code start: 0#### VMK uptime: ##:##:##:##YYYY-MM-DDTHH:MM:SS In(14) LogEFI: cpu#:####)####:[0x####]lpfc_path_claim_handler@(lpfc)#<None>+0####stack: 0####YYYY-MM-DDTHH:MM:SS In(14) LogEFI: cpu#:####)####:[0x####]lpfc_pathclaim_event@(lpfc)#<None>+0#### stack: 0x####YYYY-MM-DDTHH:MM:SS In(14) LogEFI: cpu#:####)####:[0x####]vmkWorldFunc@vmkernel#nover+0#### stack: 0####YYYY-MM-DDTHH:MM:SS In(14) LogEFI: cpu#:####)####:[0x####]CpuSched_StartWorld@vmkernel#nover+0#### stack: 0x0YYYY-MM-DDTHH:MM:SS In(14) LogEFI: cpu#:####)####:[0x####]Debug_IsInitialized@vmkernel#nover+0#### stack: 0x0
VMware vSphere ESX 9.0.x
The crash is due to a race condition in the Emulex lpfc driver logic which is triggered by rapid changes in a storage target's Destination ID (DID). When a target WWPN changes DID and reverts (common during some array upgrades), the driver experiences a use-after-free memory error while trying to process the overlapping path-claim events.
To identify if environment is affected by this specific race condition, identify "DID Flip-Flop" sequence in the logs.
Target DID Change Sequence
Using WWPN ##:##:##:##:as an example, here is the log sequence to identify in ##:##:##:## vmkernel.log/var/run/log/vmkernel.log
YYYY-MM-DDTHH:MM:SS cpu##:#### lpfc: lpfc_els_rcv_rscn:###: vmhba# RSCN received event x0 : Address format x00 : DID 0x100YYYY-MM-DDTHH:MM:SS cpu##:#### WARNING: lpfc : vmhba# lpfc_start_devloss:####: Start 10 sec devloss tmo WWPN ##:##:##:## NPort 0x100
YYYY-MM-DDTHH:MM:SS cpu##:#### lpfc: lpfc_els_rcv_rscn:####: vmhba# RSCN received event x0 : Address format x00 : DID 0x200YYYY-MM-DDTHH:MM:SS cpu##:#### lpfc : vmhba# lpfc_cmpl_prli_prli_issue:####: FCP NPR PRLI Cmpl DID 140001 Init 0 Tgt 1 EIP 1 AccCode 0x200YYYY-MM-DDTHH:MM:SS cpu##:#### lpfc: lpfc_els_rcv_rscn:####: vmhba# #### RSCN received event x0 : Address format x00 : DID 0x200YYYY-MM-DDTHH:MM:SS cpu##:#### WARNING: lpfc : vmhba5 lpfc_start_devloss:####: Start 10 sec devloss tmo WWPN ##:##:##:## NPort 0x200YYYY-MM-DDTHH:MM:SS cpu##:#### lpfc: lpfc_els_rcv_rscn:7484: vmhba5 ### RSCN received event x0 : Address format x00 : DID 0x100YYYY-MM-DDTHH:MM:SS cpu##:#### lpfc : vmhba# lpfc_cmpl_prli_prli_issue:####: FCP NPR PRLI Cmpl DID 0x100 Init 0 Tgt 1 EIP 1 AccCode 0x100Currently there is no workaround.
Broadcom Engineering is aware of the issue and fix is being developed by Emulex.
The fixed driver version would be released with future release.