In a VMware vSphere environment, all ESXi hosts within a specific cluster unexpectedly lose connection to their shared storage array.
The following behaviors are observed:
In the vCenter Server UI, alerts and events trigger stating: "Lost access to multiple datastores".
Virtual machines (VMs) running on the affected datastores may stop responding, freeze, or experience severe I/O timeouts.
Storage performance graphs drop drastically or fail to populate due to total connectivity breakdown.
Similar logs in /var/run/log/vmkernel.log
YYYY-MM-DDTHH:MM:SS.419Z cpu78:2098730)NMP: nmp_ThrottleLogForDevice:3874: Cmd 0x2a (0#############) to dev "naa.###############################" on path "vmhbaXX:CX:TX:LX" Failed: H:0x7 D:0x0 P:0x0 Invalid sense data: 0x0 0x0 0x0. Act:EVAL
VMware vSphere ESXi 8.x
This issue is typically triggered by a severe transport path failure or a hardware disruption between the ESXi hosts and the storage array. The root causes generally classify into the following areas:
Host Bus Adapter (HBA) Issues: * Outdated or incompatible HBA card firmware/driver combinations.
Internal hardware failure or degradation of the HBA initiator card itself .
Fabric Transport Failures:
Interruptions in the Storage Area Network (SAN) switch zoning, faulty physical transceivers (SFPs), or damaged fiber optic cabling.
Transient storage array port resets or controllers failing over ungracefully.
When a Storage Initiator Error occurs or a path drops under severe load conditions, ESXi stops receiving response packets, causing a SCSI command failure that manifests as a cluster-wide datastore loss.
Because the failure is occurring uniformly across all hosts in the cluster, the investigation must focus on common fabric infrastructure and the physical/firmware layer:
Engage Storage and Fabric Vendors:
Contact your Storage Array Vendor to pull array-level logs. Check for storage controller panics, high congestion, port flapping, or unsolicited failover events.
Contact your SAN Switch Vendor to review switch fabric status, look for high CRC error counters on ports, or check for faulty physical paths (SFPs/Cables) connecting the ESXi cluster to the SAN.