PF Exception 14 in world <World_ID>:tq:tq-iport IP 0x4200033dfa22 addr 0x45d9a6800f1c
/var/run/log/vmkernel.log file on ESXi host prior to the crash reveals a sequence of events where the nfnic driver repeatedly fails to update the LUN map. This is followed by memory allocation failures, indicating heap exhaustionYYYY-MM-ddTHH:MM:SS.377Z In(182) vmkernel: cpu94:2098204)nfnic: <1>: INFO: fnic_free_lun_list_by_no_active_lun: 658: FNIC free lun list:fcid:<ID>, lun:5YYYY-MM-ddTHH:MM:SS.377Z Wa(180) vmkwarning: cpu94:2098204)WARNING: nfnic: <1>: fnic_handle_report_lun: 1533: lun add failure! in_remove: 0 ioAllowed: 1YYYY-MM-ddTHH:MM:SS.377Z Wa(180) vmkwarning: cpu94:2098204)WARNING: nfnic: <1>: fnic_tport_event_handler: 2136: lunmap update failed,retry ..
YYYY-MM-ddTHH:MM:SS.377Z Wa(180) vmkwarning: cpu62:2097957)WARNING: StorageFPIN: 521: Failed to allocate memory
YYYY-MM-ddTHH:MM:SS.505Z Wa(180) vmkwarning: cpu74:2097957)WARNING: Heap: 3645: Heap storageFPINHeap already at its maximum size. Cannot expand.VMware ESXi Version: 8.0 U3
Cisco UCS Servers
FPIN (Fabric Performance Impact Notifications) capability was added to ESXi 8.0 U2 to be able to better understand fabric related issues. Due to a bug in the StorageFPIN code, when FPIN tries to allocate memory and is unable to, it can hold onto a reference count on the paths which prevents the Cisco NFNIC driver from being able to allocate new paths or re-establish existing ones.
Refer to : Temporary/transient storage path loss on Host could result in paths not coming back when using Cisco UCS and NFNIC
In certain scenarios involving continuous retries, this allocated memory is not immediately released back to the system. Over time, this behavior leads to the exhaustion of the available system memory (heap), eventually causing the host to become unresponsive and display a diagnostic screen.
This is a known issue involving the nfnic driver path handling and ESXi FPIN reference counting. Implement one of the following to fix it.
1. Upgrade to ESXi 8.0 U3e (Build 24674464) or later.
2. Update the Cisco nfnic driver to version 5.0.0.48 or later.