High Guest Latency on NetApp SSD Datastores Due to Random Large Block I/Os
search cancel

High Guest Latency on NetApp SSD Datastores Due to Random Large Block I/Os

book

Article ID: 440524

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

Symptoms:

  • Datastore write latency is observed within vCenter Server, specifically impacting datastore.

  • In esxtop > press "u" Impacted LUN shows IO queueing and high latency



  • In esxtop > press "u" > press "e" > paste the impacted naa.Id shows specific world ID generating the traffic and seeing the latency 



  • Despite the reported latency in the hypervisor, the physical storage array indicates healthy performance with no latency at the storage tier. 

Environment

VMware vSphere ESXi 8.x 
VMware vSphere ESX  9.x

Cause

A specific virtual machine workload is generating large, random I/O blocks that needs to be broken in smaller blokes on storage, leading to increase time in IO processing and upstream queuing at the ESXi host storage stack despite the backend SSDs committing data quickly. 

To identify the block size and latency experienced on specific vmdisk ran vscsiStats

  • vscsiStats shows Random Large IO block sizes: 

I/O Size           #Of IOs
512                     50
1024                   271
2048                    14
4095                    47
4096                  3178
8191                    52
8192                   558
16383                  721
16384                 1585
32768                 1012
49152                  227
65535                  104
65536                   29
81920                   79
131072                  23
262144                  59
524288                   0
524288                   6

  • Over 5,500 I/O operations are recorded with latencies exceeding 100,000us (100ms), specifically occurring at the VSCSI layer

#oF IOs  Latency in us 
0            1
0            10
0            100
2064         500
263          1000
38           5000
0            15000
4            30000
8            50000
56           100000
5545         100000

Resolution

  • Distribute I/O Load: Add a secondary VMware Paravirtual SCSI (PVSCSI) controller to the affected virtual machine. Migrate the high-impact virtual disk (scsi0:1) to this new controller to provide dedicated interrupt processing and separate queue paths.

  • Application Tuning: Consult with Application and Guest OS teams to identify and optimize the source of random large-block writes (e.g., database backups or unoptimized processes)