VMs running on the SAN datastore are not reachable
search cancel

VMs running on the SAN datastore are not reachable

book

Article ID: 402302

calendar_today

Updated On:

Products

VMware vSphere ESXi VMware vSphere ESX 7.x VMware vSphere ESX 8.x

Issue/Introduction

  • The esxi host is non responsive in vCenter and VMs running on SAN datastore of the host are unreachable.
  • From SSH to the host , 'esxcfg-scsidevs -a' or 'esxcfg-mpath' showing NO VMHBA found.
  • In  vmkernel.log, following errors are observed from both VMHBAs

vmkernel: cpu22:2097767)qlnativefc: vmhba1(3d:0.0): qlnativefcEhAbort:2754:qlnativefcEhAbort: aborting sp 0x45d9da616600 handle 245 from RISC. serialNumber=1404ed, Command timeout=8 sec. 

vmkernel: cpu29:2097731)qlnativefc: vmhba2(3d:0.1): qlnativefcEhVirtualReset:3269: aborting sp 0x45b9d6600fc0 handle 3e6 from RISC. serialNumber=ffff8083b43dd800, Command timeout=2633 sec

  •  In /usr/lib/vmware/vmkmgmt_keyval/vmkmgmt_keyval, showing the enormous 'EH Abort Count' and 'Virtual Reset Count'


              

  • 'localcli storage san fc events get' showing the VMHBA link state is flapping randomly.

2025-06-17 01:06:51.327 [vmhba1] LINK DOWN

2025-06-17 01:06:51.932 [vmhba1] LINK UP

2025-06-16 18:57:22.593 [vmhba2] LINK UP

2025-06-16 20:46:50.406 [vmhba2] LINK DOWN

2025-06-16 21:01:58.253 [vmhba2] LINK UP

2025-06-16 23:14:37.656 [vmhba2] LINK DOWN

Environment

ESXi 8.0.3 

EMC Powerstore

Cisco Systems Inc UCSC-C220-M7S 

Cause

"EH abort count" (Error handling) refers to the number of times EH  is aborting the command to terminate SCSI IOs that could be triggered by the timeout waiting for the IO returns .  The issue lies within the storage path .It could be caused by, but not limited to the following :

 

  • Storage path instability[e.g. bad SFPs on FC switches, FC port over utilization , defective FC cable ]

  • Firmware or driver bugs of the HBA card.

  • Hardware faults (e.g., failing HBA, PCI slot ,etc)

 

 

Resolution

Work with hardware vendor to run extensive hardware diagnose and replace the defective hardware component(s) if needed.