Default ESX IO Scheduler of IOPS Limit was changed from iofilter to mclock in 7.0 update3q and 8.0 update3
search cancel

Default ESX IO Scheduler of IOPS Limit was changed from iofilter to mclock in 7.0 update3q and 8.0 update3

book

Article ID: 385258

calendar_today

Updated On:

Products

VMware vSphere ESXi 7.0 VMware vSphere ESXi 8.0

Issue/Introduction

This article describes the changes in the behavior of default ESX IO Scheduler to handle IOPS Limit, when configured through storage policy.

Changes in ESXi 7.0.3 P09 (ESXi 7.0 Update 3q) and ESXi 8.0.3 (ESXi 8.0 Update 3)

Before 7.0.3 P09 and 8.0.3 U3, IOPs handling was done at IOfilter level and reservations/shares was handled by mClock IO scheduler which is the default IO scheduler for ESX. Default ESX IO Scheduler for handling IOPS Limit in Storage Policy was changed from iofilter to mclock in ESXi 7.0.3 P09 (ESXi 7.0 Update 3q) and ESXi 8.0.3 (ESXi 8.0 Update 3).

Moving the IOPs handling from iofilter to mClock IO scheduler has the following benefits:

  1. Unification of Reservations, Shares and Limits controls at a single layer.
  2. Throttling and providing scheduling at PSA layer will help reduce cycles per IO and improve performance.
  3. mClock has better handling of bursty IO traffic. Hence it will improve performance in such usecases.
  4. Improve overall code maintainability.

IOfilter takes just IOPs into account and does not share the IOPs based on IO size which could lead to some issues. mClock disk IO scheduler takes IO size into account due to the fact that targets take varying times handling small and large IO sizes.

If IO size is > 32k, IO count is taken as (IO size/32k) which means the user might not see the configured IOPs, but a lower number due to this. When targets advertise performance numbers in terms of IOPs/bandwidth, they also mention the IO size for which these numbers apply that is because targets take different times in handling different IO sizes, i.e. longer time to complete larger IO size as compared to smaller IO size, say target time to complete 1 MB IO is greater than 4KB. Due to this if one VM is pushing 1M size IOs and other 4K, to be able to honour all 3 tuneables (reservation/shares/IOPs), mClock takes size into account.

In summary, mClocks is modeled to handle differently-sized IOs to be able to do fair IO scheduling across all VMs that can drive different IO sizes.

Environment

VMware vSphere 7.0
VMware vSphere 8.0

Resolution

If user is interested only in IOPs but not reservation/shares and IO size > 32K, then to see the behavior is the same as in the case of using IOfilter without mclock. user can switch to SFQ IO Scheduler.

To enable SFQ IO Scheduler:

To check the current value of the configuration settings:

esxcli system settings advanced list -o /Disk/SchedulerWithReservation

Notes: default 1(mclock)

Set to 0(SFQ):
esxcli system settings advanced set -o /Disk/SchedulerWithReservation -i 0

Additional Information