High write IO size reported from the storage side
search cancel

High write IO size reported from the storage side

book

Article ID: 404761

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

Symptom:

  • VM facing latency issues on the Guest OS layer.

  • Storage vendor suggesting the storage is handling IO lengths higher than the optimal benchmark value.

Environment

VMware vSphere ESXi

Cause

  • vscsiStats is a tool on ESX that collects various statistics and IO traces at the vSCSI layer, which is the first layer after an IO exits from the VM and enters ESX.

It can be used to collect traces and statistics for only a specific disk of a specific VM too.

Some of the statistics available along with histograms are:

    • IO sizes
    • Outstanding IOs, when a new IO is issued
    • IO latency
    • IO interarrival latency
    1. vscsiStats -l : This command will give you the world ids of the VMs

      eg:
      [root@esx-0la :~ ] vm-support -V
      /vmfs/volumes/5fd1edce-########-####-0017a4776418/base-linux-0la/base-linux-0la.vmx (Running)
      
      [root@esx-0la :~ ] vscsiStats -l
      Virtual Machine worldGroupID: 528585, Virtual Machine Display Name: base-linux-0la, Virtual Machine Config File: /vmfs/volumes/5fd1edce-########-####-0017a4776418/base-linux-0la/base-linux-0la. vmx, {
      Virtual SCSI Disk handleID: 2270439971758083 (scsi0:0)
      }

       

    2. vscsiStats -s -t -w <world_id> : To collect vscsi stats. Run for atleast 5 minutes to get data. It collects data depending on how much time you run this.

      eg:
      [root@esx-0la :~ ] vscsiStats -s -t -w 528585
      vscsiStats: Starting Vscsi stats collection for worldGroup 528585, handleID 2270439971758083 (scsi0:0)
      vscsiStats: Starting Vscsi cmd tracing for worldGroup 528585, handleID 2270439971758083 (scsi0:0)
      <vscsiStats-traceChannel>vscsi_cmd_trace_528585_2270439971758083</vscsiStats-traceChannel>
      Success.
      

       

    3. vscsiStats -p all -w <world_id>| less :  Make sure to always open this with "| less" since multiple histograms will be printed on the console for different parameters like IO lenght, latency.

      eg:
      Histogram: IO lengths of commands for virtual machine worldGroupID : 528585, virtual disk handleID : 2270439971758083 (scsi0:0) {
      min  :	0
      max	 :  1310720
      mean :	47514
      count:	12427
       {
        159	   (<=       512
        121	   (<=       1024
        67	   (<=       2048
        30	   (<=       4095
        3138	   (<=       4096
        99	   (<=       8191
        531	   (<=       8192
        1102	   (<=       16383
        2406	   (<=       16384
        2072	   (<=       32768
        388	   (<=       49152
        187	   (<=       65535
        50	   (<=       65536
        190	   (<=       81920
        380	   (<=       131072
        1119	   (<=       262144
        317	   (<=       524288
        71	   (<=       524288
       }
      }


      Note:
      i. In the above output, on the left hand side we can see the number of IOs and on the right side we see the IO size in bytes (IEC standard).


      Using IEC standard:
      1 KiB = 1,024 bytes (Note: big K)
      1 MiB = 1,024 KiB = 1,048,576 bytes

      ii. You can also save the above histogram data in a csv file by running "vscsiStats -p all -c -w <worldID> > /tmp/vmstats-<vmname>.csv" , where a csv file will be created in /tmp directory.

      iii. Ask the customer to validate with the storage vendor about the optimal IO size for their storage array. For the sake of example, let's say it is 16KiB (16 X 1024 = 16384 bytes), then we can clearly see that the Guest OS, in the above example is sending IOs with higher IO length i.e  32768, 49152, 65535 etc. So the IO from the Guest OS level needs to be optimized with the help of Application/Guest OS vendor.

    4. vscsiStats -x -w <world_id>  : This command is to cancel or exit. Or else it will be running in the background and there will be memory consumption.

      eg:

      [root@esx-01a :~ ] vscsiStats -x -w 528585
      vscsiStats: Stopping all Vscsi stats collection for worldGroup 528585, handleID 2270439971758083 (scsi0:0)
      Success.

       

 

Resolution

  • The issue needs to be investigated further from the application layer to reduce the IO size. 

 

 

Additional Information