Lost access to volumes correlated with FC driver aborts
search cancel

Lost access to volumes correlated with FC driver aborts

book

Article ID: 396366

calendar_today

Updated On:

Products

VMware vSphere ESXi VMware vSphere ESX 7.x VMware vSphere ESX 8.x

Issue/Introduction

ESXi hosts intermittently report "Lost access to volume" for volumes backed by FC storage:

"Lost access to volume <Datastore> due to connectivity issues. Recovery attempt is in progress"

The FC driver intermittently reports aborts of I/O for sustained periods, correlated with the lost access to volumes:

cpu##:2117121)lpfc: lpfc_handle_status:5637: 0:(0):3271: FCP cmd x89 failed <2/354> sid x521d03, did x520304, oxid x363 iotag x689 Abort Requested Host Abort Req
cpu##:18338096)NMP: nmp_ThrottleLogForDevice:3867: Cmd 0x89 (0x45da66bdf208, 2097225) to dev "naa.################################" on path "vmhba#:C#:T#:L##" Failed:


/var/log/vmkernel.log reports I/O aborts (H:0x5 SCSI code) and resets (H:0x8 SCSI code) at these times.

Environment

VMware vSphere ESXi 7.x
VMware vSphere ESXi 8.x

Cause

The failed (aborted) I/O leads to failed datastore heartbeats, and (after 16 seconds approximately) to datastore heartbeat timeouts:

e.g.
/var/log/vobd.log:
cpu##:2111626) HBX: 3063: '<datastore>': HB at offset 3702784 - Waiting for timed out HB:
cpu##:2111626) [HB state abcdef02 offset 3702784 gen 261 stampUS 2658722121357 uuid <datastore UUID> jrnl <FB 7> drv 24.82 lockImpl 4 ip ##.##.##.##]


On heartbeat timeout, the datastore is marked offline by ESXi until heartbeat is again successful. 

Resolution

Confirm that FC driver and firmware is supported and at level recommended by the vendor.

Engage storage and fabric vendor support to investigate the cause of the I/O aborts.

Additional Information