Emulex HBA link status changes from "link-up" to "link-n/a" following Fabric Switch swap/change that leads to downed storage paths that stay down after Fabric Switch ports are restored
search cancel

Emulex HBA link status changes from "link-up" to "link-n/a" following Fabric Switch swap/change that leads to downed storage paths that stay down after Fabric Switch ports are restored

book

Article ID: 399934

calendar_today

Updated On:

Products

VMware vSphere ESXi VMware vSphere ESX 5.x VMware vSphere ESX 6.x VMware vSphere ESX 7.x VMware vSphere ESX 8.x VMware vSphere ESXi 5.0 VMware vSphere ESXi 5.5 VMware vSphere ESXi 8.0

Issue/Introduction

An Administrator will observe storage paths go into a dead state following a fabric switch swap/replacement, and those paths do not restore after the ports are back online. The link status for the affected paths will now state "link-n/a" for the Emulex HBAs:

$ esxcli storage core adapter list

HBA Name  Driver  Link State  UID                                   Capabilities         Description
--------  ------  ----------  ------------------------------------  -------------------  -----------
vmhba0    nhpsa   link-n/a    sas.5001438038cd8ae0                                       (0000:03:00.0) HPE Smart Array P440ar
vmhba1    lpfc    link-up     fc.20007010########:10007010########  Second Level Lun ID  (0000:05:00.0) Emulex Corporation Emulex LightPulse LPe16000 PCIe Fibre Channel Adapter
vmhba2    lpfc    link-n/a    fc.20007010########:10007010########  Second Level Lun ID  (0000:05:00.1) Emulex Corporation Emulex LightPulse LPe16000 PCIe Fibre Channel Adapter
vmhba3    lpfc    link-n/a    fc.20007010########:10007010########  Second Level Lun ID  (0000:84:00.0) Emulex Corporation Emulex LightPulse LPe16000 PCIe Fibre Channel Adapter
vmhba4    lpfc    link-n/a    fc.20007010########:10007010########  Second Level Lun ID  (0000:84:00.1) Emulex Corporation Emulex LightPulse LPe16000 PCIe Fibre Channel Adapter

Environment

ESXi (All versions)

Emulex HBAs

Cause

When replacing a Fabric Switch, the Emulex driver may receive unexpected frames which it may not handle correctly, specifically during frame payload DMA to host buffer. This is due to a mismatch between frame size versus actual frame payload stored in the Emulex HBA receive buffer. When this occurs, the Emulex driver will not be able to interact with the physical HBA any longer and there will be a log spew to /var/log/vmkernel.log every five minutes stating a DMA allocation failure until the ESXi host is rebooted:

2025-03-28T01:00:10.348Z cpu2:2489661)WARNING: lpfc: _lpfc_SlabAlloc:1870: 2:3371 Alloc failure from Slab 7 Caller <lpfc_io_dma_alloc:2099> cur cnt 2821, Max 2821
2025-03-28T01:05:10.347Z cpu33:2489662)WARNING: lpfc: _lpfc_SlabAlloc:1870: 2:3371 Alloc failure from Slab 7 Caller <lpfc_io_dma_alloc:2099> cur cnt 2821, Max 2821
2025-03-28T01:10:10.353Z cpu0:2489663)WARNING: lpfc: _lpfc_SlabAlloc:1870: 2:3371 Alloc failure from Slab 7 Caller <lpfc_io_dma_alloc:2099> cur cnt 2821, Max 2821
2025-03-28T01:15:10.355Z cpu28:2489659)WARNING: lpfc: _lpfc_SlabAlloc:1870: 2:3371 Alloc failure from Slab 7 Caller <lpfc_io_dma_alloc:2099> cur cnt 2821, Max 2821
2025-03-28T01:20:10.353Z cpu6:2489661)WARNING: lpfc: _lpfc_SlabAlloc:1870: 2:3371 Alloc failure from Slab 7 Caller <lpfc_io_dma_alloc:2099> cur cnt 2821, Max 2821

 

Resolution

Rebooting the ESXi host is the only know method to recovering from this condition and restoring the storage paths as it will reload the Emulex driver. In order to avoid this situation from occurring in the first place, Emulex recommends to disable the FC ports for the fabric switches prior to the replacement of that switch.