The /var/run/log/vmkernel.log shows "performance has deteriorated" or "I/O latency increased"
[YYYY-MM-DDTHH:MM:SS] cpu51:2098041)WARNING: ScsiDeviceIO: 513: Device naa.######################### performance has deteriorated. I/O latency increased from average value of 38762 microseconds to 776315 microseconds.
[YYYY-MM-DDTHH:MM:SS] cpu47:2098037)WARNING: ScsiDeviceIO: 1443: Device naa.#########################
performance has deteriorated. I/O latency increased from average value of 12017 microseconds to 254228 microseconds.[YYYY-MM-DDTHH:MM:SS] cpu47:2098038)WARNING: ScsiDeviceIO: 1216: Device naa.
#########################
performance has deteriorated. I/O latency increased from average value of 18057 microseconds to 534229 microseconds.
VMware vSphere ESXi 6.5.x
VMware vSphere ESXi 6.7.x
VMware vSphere ESXi 7.x
VMware vSphere ESXi 8.x
The numbers reported in the events are in microseconds, and refer to the DAVG measurements in the esxtop storage screen. See Using esxtop to identify storage performance issues for ESXi (multiple versions)
With traditional non-flash based technologies, the generally accepted threshold is about 10 milliseconds (10,000 microseconds).
With flash-based storage it is rare to see DAVG latency above 1-2 milliseconds, so these events should be investigated if the latency is higher.
Latency is a measure of the round-trip time between the issuance of a SCSI command from the hypervisor, through the transport to the surface of the media, and the return. Therefore, the source of the delay could be anywhere in the fabric, the storage infrastructure, or anywhere along the path.
High device latency:
If the device latency is high for a consistent period of time, check the storage performance by verifying the logs on the storage array for any indication of a failure. If failures are logged on the storage array side, contact the storage vendor for further assistance.
Check if these messages are generated during any scheduled tasks such as backups or replications, as these can cause intermittent performance problems.
Overload conditions on the device:
If the message is generated because of an overload condition, reduce the load on the affected storage device.
Framework to characterize the latency.
1) Magnitude: How high are the spikes in DAVG?
2) Duration: How long does each spike last?
3) Frequency: What pattern is exhibited by the date/time stamps?
4) Scope: How widespread are the events?
Magnitudes of limited amount for example, 20-30ms for a duration of only a few seconds, on an occasional frequency, on a small subset of datastores, is a vastly different situation than magnitudes of multiple seconds, for a duration of multiple minutes.
Finally, note that ESXi does not cause the latency spikes; it merely reports them. The root cause cannot be determined from the ESXi perspective alone. However, the data outlined above can help guide the investigation outside of the ESXi hosts.