Storage connection was interrupted and now there is an ISCSI lun that is causing instability on the host.

search cancel

Storage connection was interrupted and now there is an ISCSI lun that is causing instability on the host.

book

Article ID: 392369

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

After a storage outage one storage LUN connected to a different iSCSI storage array is not mounting and the host is experiencing instability

Environment

VMware vSphere ESXi 7.x
VMware vSphere ESXi 8.x

Cause

Faulty switch in the storage network causing performance issues.

In the vobd log file we see the following

2025-03-18T22:32:50.950Z cpu45:2098145)ScsiDevice: 4797: Successfully registered device "naa.#########################" from plugin "NMP" of type 02025-03-24T17:40:11.806Z In(14) vobd[2098149]: [vmfsCorrelator] 927296983884us: [esx.problem.vmfs.heartbeat.recovered] ########-########-####-############ ###########
2025-03-24T17:40:12.367Z In(14) vobd[2098149]: [pageretireCorrelator] 927322098209us: [vob.pageretire.selectedmpnthreshold.host.exceeded] Number of MPNs selected for retirement is 32
2025-03-24T17:40:22.977Z In(14) vobd[2098149]: [vmfsCorrelator] 927333301535us: [vob.vmfs.heartbeat.recovered] Reclaimed heartbeat for volume ########-########-####-############ (AMI-WSDC-VDI05-CLONE-PD-NR_06): [Timeout] [HB state abcdef02 offset 3211264 gen 5537 stampUS 927319980478 uuid ########-########-####-############ jrnl <FB 25165827> drv 24.82]

Looking at the vmkernel logs we see

2025-03-24T18:27:34.296Z Wa(180) vmkwarning: cpu34:2098456)WARNING: ScsiDeviceIO: 1779: Device naa.######################### performance has deteriorated. I/O latency increased from average value of 456 microseconds to 11849 microseconds.
2025-03-24T18:28:25.099Z Wa(180) vmkwarning: cpu34:2098457)WARNING: ScsiDeviceIO: 1779: Device naa.######################### performance has deteriorated. I/O latency increased from average value of 456 microseconds to 11536 microseconds.

From the esxtop output we see DAVG values on the host in the 50-100 ms range.

Resolution

High DAVG values indicate that the performance values occur at the array or storage network level and need to be investigated there.

Additional Information

Using esxtop to identify storage performance issues for ESXi (multiple versions)

ESXi host loses connectivity to a VMFS datastore

Feedback

thumb_up Yes

thumb_down No