When performing SAN maintenance or while having an unexpected storage path outage, once the storage path is back up the NFNIC driver will be unable to add those paths back. You will observed the following sequence continuously in /var/log/vmkernel.log:
WARNING: nfnic: <2>: fnic_handle_report_lun: 1467: lun add failure! in_remove: 0 ioAllowed: 1
WARNING: nfnic: <2>: fnic_tport_event_handler: 2130: lunmap update failed,retry ..
nfnic: <2>: INFO: fnic_handle_report_lun: 1380: Report luns response for target_fcid : 0xaf01e0 target_id:283 num_luns 10
WARNING: nfnic: <2>: fnic_handle_report_lun: 1442: vmk_ScsiScanAndClaimPaths returned BUSY
You will also observe the following events related to StorageFPIN:
WARNING: StorageFPIN: 521: Failed to allocate memory.
WARNING: Heap: 3645: Heap storageFPINHeap already at its maximum size. Cannot expand.
VMkernel.log:
2025-03-15T17:32:48.078Z Wa(180) vmkwarning: cpu20:2097755)WARNING: nfnic: <1>: fnic_handle_report_lun: 1465: lun add failure! in_remove: 0 ioAllowed: 1
2025-03-15T17:32:48.078Z Wa(180) vmkwarning: cpu20:2097755)WARNING: nfnic: <1>: fnic_tport_event_handler: 2129: lunmap update failed,retry ..
2025-03-15T17:32:48.078Z In(182) vmkernel: cpu20:2097755)nfnic: <1>: INFO: fnic_handle_report_lun: 1379: Report luns response for target_fcid : 0xa02a1 target_id:72 num_luns 8
2025-03-15T17:32:50.023Z Wa(180) vmkwarning: cpu8:2097762)WARNING: nfnic: <2>: fnic_handle_report_lun: 1442: vmk_ScsiScanAndClaimPaths returned BUSY
2025-03-15T17:32:50.078Z Wa(180) vmkwarning: cpu20:2097755)WARNING: nfnic: <1>: fnic_handle_report_lun: 1442: vmk_ScsiScanAndClaimPaths returned BUSY
2025-03-15T17:32:50.835Z In(182) vmkernel: cpu8:89469270)zdriver: _zmod_periodic:348: #0- logs are not pulled
2025-03-15T17:32:51.023Z Wa(180) vmkwarning: cpu8:2097762)WARNING: nfnic: <2>: fnic_handle_report_lun: 1442: vmk_ScsiScanAndClaimPaths returned BUSY
2025-03-15T17:32:51.078Z Wa(180) vmkwarning: cpu20:2097755)WARNING: nfnic: <1>: fnic_handle_report_lun: 1442: vmk_ScsiScanAndClaimPaths returned BUSY
2025-03-15T17:32:51.835Z In(182) vmkernel: cpu8:89469270)zdriver: _zmod_periodic:348: #0- logs are not pulled
2025-03-15T12:18:17.256Z Wa(180) vmkwarning: cpu27:2097450)WARNING: StorageFPIN: 521: Failed to allocate memory.
2025-03-15T13:18:17.587Z Wa(180) vmkwarning: cpu27:2097450)WARNING: StorageFPIN: 521: Failed to allocate memory.
2025-03-15T14:18:17.918Z Wa(180) vmkwarning: cpu21:2097450)WARNING: StorageFPIN: 521: Failed to allocate memory.
2025-03-15T15:18:18.249Z Wa(180) vmkwarning: cpu23:2097450)WARNING: StorageFPIN: 521: Failed to allocate memory.
2025-03-15T12:18:17.256Z Wa(180) vmkwarning: cpu27:2097450)WARNING: Heap: 3645: Heap storageFPINHeap already at its maximum size. Cannot expand.
2025-03-15T13:18:17.587Z Wa(180) vmkwarning: cpu27:2097450)WARNING: Heap: 3645: Heap storageFPINHeap already at its maximum size. Cannot expand.
2025-03-15T14:18:17.918Z Wa(180) vmkwarning: cpu21:2097450)WARNING: Heap: 3645: Heap storageFPINHeap already at its maximum size. Cannot expand.
2025-03-15T15:18:18.249Z Wa(180) vmkwarning: cpu23:2097450)WARNING: Heap: 3645: Heap storageFPINHeap already at its maximum size. Cannot expand.
You can check available FPINHeap with the following command. A healthy host will around 5246448 bytes Available but an impacted host will show signifyingly less free space sometimes 16k bytes or less.
esxcfg-info -a |grep -A3 storageFPINHeap|grep "Max Available"
Example:
Host-1 shows that it has run out of FPINheap.
|----Max Available...................................416 bytes
Host-2 shows that we have not run out of Heap.
|----Max Available...................................3219872 bytes
vSphere 8.X, ESXi 8.X
FPIN (Fabric Performance Impact Notifications) capability was added to ESXi 8.0 U2 to be able to better understand fabric related issues. Due to a bug in the StorageFPIN code, when FPIN tries to allocate memory and is unable to, it can hold onto a reference count on the paths which prevents the Cisco NFNIC driver from being able to allocate new paths or re-establish existing ones.
This is a known issue with both FPIN as well as how the Cisco NFNIC driver is coded to behave when there are path losses. The NFNIC driver does not save storage port bindings so when a storage path reestablishes after an outage or path loss, it will simply create brand new paths and increment target numbers. Because of the bug with FPIN keeping a reference count on those paths, the Cisco NFNIC driver is unable to establish new paths.
There are two permanent fixes for this issue:
Either one of these fixes will resolve the issue of storage path recovery.
To workaround this issue, it is recommended to disable FPIN on ESXi 8.0 hosts, especially when using Cisco UCS and NFNIC:
For ESXi 8.0 U3 and newer, please use the following command:
esxcli storage fpin info set -e false
To confirm the setting:esxcli storage fpin info get
NOTE: This setting change does not require a reboot on its own however if an ESXi host is already in a memory heap exhaustion state for storageFPINHeap then rebooting the host is required after making this setting change.
VMware ESXi 8.0 Update 3 Release Notes
Fabric Notification support for SAN clusters:
ESXi 8.0 Update 3 introduces support for Fabric Performance Impact Notifications Link Integrity (FPIN-LI). With FPIN-LI, the vSphere infrastructure layer can manage notifications from SAN switches or targets, identifying degraded SAN links and ensuring only healthy paths are used for storage devices. FPIN can also notify ESXi hosts for storage link congestion and errors.
Support for Fibre Channel Extended Link Services (FC-ELS):
With vSphere 8.0 Update 3, you can use the command esxcli storage fpin info set -e= <true/false> to activate or deactivate the Fabric Performance Impact Notification (FPIN). The command saves the FPIN activation to both ConfigStore and the VMkernel System Interface Shell and persists across ESXi reboots. This is enabled by both Broadcom’s lpfc and Marvell’s qlnativefc drivers.
For ESX 8.0 U2 and prior, use the following command:
vsish -e set /storage/fpin/info 0
NOTE: This vsish command is NOT persistent across reboots. Thus we recommend upgrading to ESXi 8.0 U3 and then disabling FPIN.