vSphere ESXi Hosts lose storage connectivity with "Out of memory" errors due to Cisco nfnic Target ID Storm
search cancel

vSphere ESXi Hosts lose storage connectivity with "Out of memory" errors due to Cisco nfnic Target ID Storm

book

Article ID: 439742

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

VMware ESXi hosts lose connection to mapped Fibre Channel LUNs. The vmkernel.log reports out-of-memory path allocation failures, followed by LUNs transitioning to "Unregistered Device" status. This coincides with excessive Target ID incrementation causing VMkernel storage heap exhaustion and physical link flapping.

From Vmkernal.log 

WARNING: ScsiPath: 1265: Couldn't allocate path (vmhba2:C0:T10391:L57) during scan: Out of memory
WARNING: NMP: nmp_RegisterDevice:902: Registration of NMP device with primary uid 'naa.' failed because the device structure could not be allocated.
WARNING: NMP: nmp_CallRegisterDevice:1293: Device, seen through path vmhba3:C0:T10294:L60 is not registered (no active paths)
INFO: fnic_handle_link: 1003: link status 1 down cnt 4994

Environment

VMware vSphere ESXi 

Cisco UCS Fabric Interconnect.

Cisco nfnic Fibre Channel driver.

Cause

A known Cisco UCS Manager defect (CSCwk91747 / FN74209) causes unexpected Virtual Fibre Channel (vFC) link flaps. This repetitive Layer 1 disruption triggers a target mapping bug within the Cisco nfnic driver, resulting in a Target ID Storm that rapidly consumes and exhausts the VMkernel storage heap memory.

Resolution

  • Reach out to Cisco to review and upgrade the physical Cisco UCS Fabric Interconnect firmware to a fixed release to eliminate the CSCwk91747 vFC flap defect.

  • Upgrade the Cisco nfnic driver on the ESXi hosts to version 5.0.0.48 or later to resolve the target mapping software bug.

  • Perform a graceful reboot of the affected ESXi hosts to flush the stale path allocations and reclaim the VMkernel storage heap memory.

Additional Information

Cisco Field Notice: FN74209

Cisco Bug: CSCwk91747