High IOPs alerts with latency of 56 ms to 80 ms reported by VMs on specific datastores
VMware vSphere ESXi 7.0
VMware vSphere ESXi 8.0
Fiber channel fabric latency causing heartbeat failures
From the vmkernel log
2025-03-13T05:00:10.827Z cpu41:2097290)ScsiDeviceIO: 1513: Device naa.######################### performance has improved. I/O latency reduced from 4011340 microseconds to 787918 microseconds.
2025-03-13T05:00:17.800Z cpu40:2097290)WARNING: ScsiDeviceIO: 1513: Device naa.######################### performance has deteriorated. I/O latency increased from average value of 3020 microseconds to 64910 microseconds.
2025-03-13T05:00:18.605Z cpu40:2097290)WARNING: ScsiDeviceIO: 1513: Device naa.######################### performance has deteriorated. I/O latency increased from average value of 3021 microseconds to 130437 microseconds.
Vobd log showing the heartbeats timing out and reconnecting
2025-03-13T05:03:40.302Z: [vmfsCorrelator] 1058739920621us: [vob.vmfs.heartbeat.recovered] Reclaimed heartbeat for volume ########-########-####-############: [Timeout] [HB state abcdef02 offset 3641344 gen 167 stampUS 1058739845375 uuid ########-########-####-############ jrnl <FB 25165830> drv 24.82]
2025-03-13T05:03:40.302Z: [vmfsCorrelator] 1058751801403us: [esx.problem.vmfs.heartbeat.recovered] ########-########-####-############
2025-03-13T05:08:21.245Z: [vmfsCorrelator] 1059020860304us: [vob.vmfs.heartbeat.timedout] ########-########-####-############
2025-03-13T05:08:21.245Z: [vmfsCorrelator] 1059032744650us: [esx.problem.vmfs.heartbeat.timedout] ########-########-####-############
2025-03-13T05:08:22.109Z: [vmfsCorrelator] 1059021724949us: [vob.vmfs.heartbeat.recovered] Reclaimed heartbeat for volume ########-########-####-############ : [Timeout] [HB state abcdef02 offset 3641344 gen 167 stampUS 1059021639173 uuid ########-########-####-############ jrnl <FB 25165830> drv 24.82]
2025-03-13T05:08:22.109Z: [vmfsCorrelator] 1059033608835us: [esx.problem.vmfs.heartbeat.recovered] ########-########-####-############
The performance comes from outside of the ESXi host, need to check the fabric to determine where the latency is resulting from