Datastores Inaccessible and ESXi Hosts Unresponsive due to Fibre Channel I/O Aborts

Products

VMware vCenter Server

Issue/Introduction

In a VMware vSphere environment, storage connectivity instability caused by Fiber Channel (FC) I/O aborts can lead to severe ESXi host management failures. You may experience the following symptoms:

Multiple datastores become inaccessible or appear as Inactive.
Virtual Machines (VMs) enter a hung state, zombie state, or become completely unresponsive.
Management operations (Power On, Power Off, Reset, or vMotion) are greyed out or fail.
ESXi hosts show as Not Responding in vCenter Server, and the hostd management agent stalls.

Review the following logs for evidence of FC I/O aborts:
- Datastore Heartbeat Timeouts (vobd.log):
- vobd: [vmfsCorrelator] [vob.vmfs.heartbeat.timedout] ##########-#########-####-############ Datastore_name vobd: [vmfsCorrelator] [esx.problem.vmfs.heartbeat.timedout] ##########-#########-####-############ Datastore_name
- Path Failures and Redundancy Degradation (vobd.log):
- vobd: [scsiCorrelator] [vob.scsi.scsipath.pathstate.deadver2] scsiPath vmhba8:C0:T11:L8 changed state from on vobd: [scsiCorrelator] [esx.problem.storage.redundancy.degraded] Path redundancy to storage device naa.########################## degraded. Path vmhba#:C0:T11:L8 is down.
- SCSI Command Failures and I/O Aborts (vmkernel.log):
- vmkernel: NMP: nmp_ThrottleLogForDevice:3893: Cmd 0x12 ... to dev "naa.##########################" on path "vmhbax:C0:T10:L8" Failed: H:0x1 D:0x0 P:0x0 vmkernel: lpfc: lpfc_handle_status:3179: 3:(0) Compression log for fcp target 11, path is dead, IO errors: busy 0, retry 169, no_connect 120 .
- Path Failures:
- vobd: [vob.scsi.scsipath.pathstate.deadver2] scsiPath vmhba#:C0:T11:L8 changed state from on
- Connectivity Warnings:
- VMW_SATP_ALUA: satp_alua_getTargetPortInfo:190: Could not get page 83 INQUIRY data for path "vmhba#:C0:T11:L10" - No connection.

Environment

ESXi 8.x

Cause

The issue is caused by constant Fiber Channel (FC) I/O aborts on specific Host Bus Adapter (HBA) adapters. These aborts trigger VMFS heartbeat timeouts, causing the ESXi host to lose connectivity to storage volumes. This ultimately stalls the management agents due to severe I/O threads being stuck waiting for storage responses.

Resolution

To restore stability and storage connectivity, follow these recovery procedures:

1. Identify Affected HBA Adapters:

Review vmkernel.log and vobd.log for the following indicators:

Path Failures: vob.scsi.scsipath.pathstate.deadver2
Heartbeat Timeouts: esx.problem.vmfs.heartbeat.timedout
SCSI Aborts: Logs showing H:0x1 D:0x0 P:0x0 or similar command failures. Multiple datastores are not accessible

2. Reset HBA via CLI

Attempt to re-initialize the HBA driver and hardware without a full host reboot. Run the following command via the ESXi CLI for the affected adapter:

Review this command before running it.

localcli storage san fc reset -A <vmhba_ID>

Note: Replace <vmhba_ID> with the affected adapter (e.g., vmhba1).

3. Coordinate SAN Fabric Review: If resets do not stabilize the links, work with your SAN team to:

If resets do not stabilize the links:

- Disable or "shut" unstable switch ports.
- Verify zoning consistency across all HBAs.

4. Perform Host Reboot:

If management agents remain unresponsive after an HBA reset, a hard reboot of the ESXi host is required to clear stuck I/O threads.