VMware vSAN
As vSAN is a distributed file system utilizing recourses on all hosts any delay in communication among any host in the cluster in the form of network related issues or poorly performing disks can impact a vSAN environment.
This can be seen in a number of ways.
When we have a disk / Disk-group that is not preforming up to the rest of the cluster, this can cause delayed I/O impacting the rest of the environment. This can impact vSAN backend performance and even impact the Guest VMs if not identified and corrected.
Properly administer the vSAN environment and staying on top of any vSAN Skyline Health Alerts that may have triggered to address the issue ASAP.
Review the vSAN environment for any issues in vSAN Skyline Health and vCenter performance metrics for vSAN that is causing delayed I/O, this can be a host that is having a network issue or a disk that is showing physical device latency.
If you have reviewed the entire cluster and have not found a cause for the perceived latency. Please review the hosts to test for Network loss and Disk latency.
Network
To review the environment for network latency please reference Troubleshooting the vSAN Network and determine if you have any network loss or high latency if so please work with your network team to correct this.
Disk
Disks can display latency in several ways but the most common would be via "performance has deteriorated messages seen in the vmkernel logs. "I/O latency increased from average value" can also been seen.
We can see an example of this below were a disk's performance has deteriorated from 2656761 microseconds to 522372 microseconds.
2024-06-12T11:24:01.880Z cpu31:2098014)WARNING: ScsiDeviceIO: 1513: Device naa.5000c5009a063c9b performance has deteriorated. I/O latency increased from average value of 10382 microseconds to 653962 microseconds.
2024-06-12T11:24:01.887Z cpu31:2098014)WARNING: ScsiDeviceIO: 1513: Device naa.5000c5009a063c9b performance has deteriorated. I/O latency increased from average value of 10382 microseconds to 1314049 microseconds.
2024-06-12T11:24:05.576Z cpu39:2098007)WARNING: ScsiDeviceIO: 1513: Device naa.5000c5009a063c9b performance has deteriorated. I/O latency increased from average value of 10382 microseconds to 2656761 microseconds.
2024-06-12T11:24:11.024Z cpu23:2098011)ScsiDeviceIO: 1513: Device naa.5000c5009a063c9b performance has improved. I/O latency reduced from 2656761 microseconds to 522372 microseconds.
2024-06-12T11:24:16.038Z cpu7:2098009)ScsiDeviceIO: 1513: Device naa.5000c5009a063c9b performance has improved. I/O latency reduced from 522372 microseconds to 102114 microseconds.
To correct this please follow How to troubleshoot vSAN OSA disk issues to remove/recreate the impacted disk/disk-group, if the issue repapers please remove the disk/disk-group and reach out to your hardware vendor for a replacement.
If further vSAN Performance troubleshooting is required follow Collecting vSAN Performance Service data for vSAN performance issues to collect the entire cluster logs and open a case with VMware by Broadcom for further assistance.
Please refer to the following article for additional information on Troubleshooting vSAN Performance