esxtop > press "u" Impacted LUN shows IO queueing and high latencyesxtop > press "u" > press "e" > paste the impacted naa.Id shows specific world ID generating the traffic and seeing the latency VMware vSphere ESXi 8.x
VMware vSphere ESX 9.x
A specific virtual machine workload is generating large, random I/O blocks that needs to be broken in smaller blokes on storage, leading to increase time in IO processing and upstream queuing at the ESXi host storage stack despite the backend SSDs committing data quickly.
To identify the block size and latency experienced on specific vmdisk ran vscsiStats
I/O Size #Of IOs512 501024 2712048 144095 474096 31788191 528192 55816383 72116384 158532768 101249152 22765535 10465536 2981920 79131072 23262144 59524288 0524288 6
#oF IOs Latency in us 0 10 100 1002064 500263 100038 50000 150004 300008 5000056 1000005545 100000
Distribute I/O Load: Add a secondary VMware Paravirtual SCSI (PVSCSI) controller to the affected virtual machine. Migrate the high-impact virtual disk (scsi0:1) to this new controller to provide dedicated interrupt processing and separate queue paths.
Application Tuning: Consult with Application and Guest OS teams to identify and optimize the source of random large-block writes (e.g., database backups or unoptimized processes)