A PSOD (purple screen of death) is seen on the ESXi host having Emulex HBAs with lpfc driver hitting exception in 'lpfc_sli4_delete_els_xri_aborted' function
search cancel

A PSOD (purple screen of death) is seen on the ESXi host having Emulex HBAs with lpfc driver hitting exception in 'lpfc_sli4_delete_els_xri_aborted' function

book

Article ID: 395068

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

A PSOD is seen on the ESXi host with Emulex HBAs which has the following error :

Environment

vSphere ESXi 8.0.3
vSphere ESXi 9.0.0

Cause

This issue is a result of invalid memory access by the Emulex lpfc driver. This is typically observed in a configuration where multiple Emulex HBA initiator ports are zoned with one or more target ports. It causes an unhandled condition in the driver causing it to crash. 

Numerous RSCNs are received from another initiator which is also part of the same zone, triggering a race condition in lpfc driver that leads to a 'use-after-free' issue.

The following or similar is logged frequently in the host's vmkernel.log file

YYYY-MM-DDTHH:MM:SS.Z In(182) vmkernel: cpu36:2098009)lpfc: lpfc_els_rcv_rscn:7907: vmhba# 0214 RSCN received Data: x800220 x0 x4 x1
YYYY-MM-DDTHH:MM:SS.Z In(182) vmkernel: cpu36:2098009)lpfc: lpfc_els_rcv_rscn:7914: vmhba# 5973 RSCN received event x0 : Address format x00 : DID x870100

To work around this issue, implement single initiator zoning (one initiator per zone) as outlined in the documentation 'Using ESX with Fibre Channel SAN'.

 

Resolution

Issue is fixed in the following updates:


•    Inbox: ESXi 9.0.1.0
•    Async: lpfc driver version greater than or equal to 14.4.576.11.

Additional Information

For more information on Register State Change Notifications (RSCNs) messages Ref: Register State Change Notifications (RSCNs) messages from the HBA driver observed in /var/log/vmkernel.log on ESXi