ESXi hosts intermittently report "Lost access to volume" for volumes backed by FC storage:
"Lost access to volume <Datastore> due to connectivity issues. Recovery attempt is in progress"
The FC driver intermittently reports aborts of I/O for sustained periods, correlated with the lost access to volumes:
cpu##:2117121)lpfc: lpfc_handle_status:5637: 0:(0):3271: FCP cmd x89 failed <2/354> sid x521d03, did x520304, oxid x363 iotag x689 Abort Requested Host Abort Reqcpu##:18338096)NMP: nmp_ThrottleLogForDevice:3867: Cmd 0x89 (0x45da66bdf208, 2097225) to dev "naa.################################" on path "vmhba#:C#:T#:L##" Failed:
/var/log/vmkernel.log reports I/O aborts (H:0x5 SCSI code) and resets (H:0x8 SCSI code) at these times.
LUN Busy events also observed
####-##-##T##:##:##.###Z cpu##:2098412)ScsiDeviceIO: 4115: Cmd(0x45d988625dc8) 0x8a, CmdSN 0x800e000c from world 3129234 to dev "naa.#############################" failed H:0x0 D:0x8 P:0x0
VMware vSphere ESXi 7.x
VMware vSphere ESXi 8.x
performance has deteriorated. I/O latency increased from average value of 6661 microseconds to 826700 microseconds.The failed (aborted) I/O leads to failed datastore heartbeats, and (after 16 seconds approximately) to datastore heartbeat timeouts:
e.g./var/log/vobd.log:cpu##:2111626) HBX: 3063: '<datastore>': HB at offset 3702784 - Waiting for timed out HB:cpu##:2111626) [HB state abcdef02 offset 3702784 gen 261 stampUS 2658722121357 uuid <datastore UUID> jrnl <FB 7> drv 24.82 lockImpl 4 ip ##.##.##.##]
On heartbeat timeout, the datastore is marked offline by ESXi until heartbeat is again successful.
Engage storage and fabric vendor support to investigate the cause of the I/O aborts.
For additional information, see Understanding lost access to volume messages in ESXi
Japanese KB: FC ドライバの中段に伴うボリュームへのアクセス喪失