Production SQL Cluster is unhealthy and/or performing slow.
search cancel

Production SQL Cluster is unhealthy and/or performing slow.

book

Article ID: 392918

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

Multiple VMs are offline or experiencing latency.  Storage connections appear to be healthy.  Receiving errors indicating that VMs are unable to migrate between hosts.  Error displayed indicating "Insufficient vSphere HA failover resources."  Logs show "no connection", "busy" and "NMP failed I/O error" against the LUNs.

Environment

vSphere 8.X

Cause

Memory exhaustion from the FPIN (Fabric Performance Impact Notifications) process.  May see the following in the vmkernel.log.

2025-03-07T11:42:01.755Z Wa(180) vmkwarning: cpu11:2097450)WARNING: Heap: 3645: Heap storageFPINHeap already at its maximum size. Cannot expand.
2025-03-07T11:42:01.755Z Wa(180) vmkwarning: cpu11:2097450)WARNING: StorageFPIN: 521: Failed to allocate memory.

Resolution

Disable FPIN (Fabric Performance Impact Notifications) which introduced a memory leak bug.  Setting this per the following KB will eliminate the memory leak.  Can also upgrade to 8.0U3 P05 for the fix.  

Temporary/transient storage path loss on ESXi 8.0 could result in paths not coming back when using Cisco UCS and NFNIC