vSAN OSA resync can overload network, causing significant storage IO impact during resync operations such as rebuilding lost data or disk rebalancing.
search cancel

vSAN OSA resync can overload network, causing significant storage IO impact during resync operations such as rebuilding lost data or disk rebalancing.

book

Article ID: 371138

calendar_today

Updated On:

Products

VMware vSAN

Issue/Introduction

  • High VM disk latency using vSAN during resync operations such as rebuilding after HW failure, node removal, disk/disk group removal, or rebalancing.
  • This applies to all releases of vSAN OSA, but not vSAN ESA.

Environment

VMware vSAN OSA

Cause

  • With modern SSDs, vSAN OSA resync can now overload a 10Gb network, and in some cases even a 25Gb network, resulting in very negative impact to VM workload during resync operations such as rebuilding lost data or disk rebalancing.
  • vSAN has a dynamic resync throttle at the disk group level, but it only monitors disk latency, not network latency.

Resolution

Consider upgrading your network infrastructure from 10G to 25G or higher. If upgrading your network infrastructure is not possible at this time then then see below:

To prevent network overload from resync, the vSAN resync throttle must be set manually. 

OSA resync throttling is enforced at the disk group level.  At each disk group, it limits the resync in both directions (reads and writes). For example a resync limit of 100 MB will limit each disk group to 100 MB of resync reads and 100 MB of resync writes.

PowerShell must be used for persistent setting of the max resync limit.   Using esxcli to set resync may be overwritten by vCenter.

PowerShell must be installed on a machine with network access to vCenter. For assistance with installing PowerShell see PowerCLI Installation Guide

Use the PowerShell script attached to this KB to set the resync limit. 

Notes:

  • Setting the resync throttle will raise "vSAN cluster alarm 'Resync Operations Throttling'".  This can be ignored.
  • The script can only set the resync throttle for existing clusters and must be re-run to set resync throttle for newly added clusters.

SYNTAX
    .\configure-resync-throttle.ps1 [-ResyncIopsLimit] <Int32> [-ClusterName <String>] [<CommonParameters>]
    
DESCRIPTION
    Applies a vSAN resync limit in Mbps to the cluster(s).
    If -ClusterName is not specified, it will apply to all clusters registered in vCenter.  

PARAMETERS
    -ResyncIopsLimit <Int32>
        Resync IOPS limit configuration in Mbps. The value should be between 0 and 512.
        
    -ClusterName <String>
        Specifies a Cluster to apply the Resync limit.
       
Resync throttle value

Because the resync is per disk group, the configured value must be adjusted for the disk group count. Follow the below steps to calculate the resync throttle value based on the amount of disk groups per host. 

1. Select the host level resync throttle MB/s based on vSAN network speed

10Gb vSAN bandwidth    300
25Gb vSAN bandwidth    750

Note: If vSAN shares the NIC interface with other vSphere services, even if that sharing happens only in the case of a protected link failure, then the potential vSAN bandwidth should be calculated (based on Network IO Control shares) and the resync throttle value from the table should be scaled based on the reduced bandwidth. For example, if vSAN bandwidth might reduce to 5 Gb/s (50% of 10Gb) then the starting host level resync value should be 50% of 300 = 150.

2. Divide by number of disk groups (cache disks) per host

Example: 
10Gb vSAN network with 2 disk groups per host:   300 / 2 = 150
% pwsh
PS> Connect-VIServer xxxx  -User [email protected] -Password xxxx
PS > configure-resync-throttle.ps1 -ResyncIopsLimit 150

Attachments

configure-resync-throttle.ps1 get_app