Heavy resync traffic may cause VM IO performance degradation
searchcancel
Heavy resync traffic may cause VM IO performance degradation
book
Article ID: 327006
calendar_today
Updated On: 03-05-2025
Products
VMware vSAN
Issue/Introduction
Impact/Risks: If resync operations are allowed to flow with too much bandwidth, some environments may experience congestion, as a result, depending on the current workload and objects that need to be reprotected/rebuilt.
Symptoms:
There are multiple scenarios around hardware failure modes and a few workflows in vSAN which could cause Resync/repair to ensure VM accessibility.
Typical scenarios and workflows are:
• One or more node or disk failures
• Node or disk evacuation
• VM storage policy reconfiguration
• Cluster rebalancing in case disks are greater than 80% full
• Upgrade scenarios like disk format upgrade and enabling deduplication and compression
vSAN is using a congestion algorithm that first delays resync traffic before VM IO traffic is also delayed. However, VM IO might still be impacted in the following cases:
If VM I/O is low compared to resync, VM I/O could become starved by the resync traffic and incur delay.
If VM I/O and resync traffic are high, then the congestion algorithm would first impact resyncs, but this might not be enough to improve destaging at LSOM at which point additional build-up of VM IO could kick congestion for VM traffic causing latency increase in the VMs.
Environment
VMware vSAN (All Versions)
Resolution
VM I/O performance degradation:
Starting with vSAN 6.7x we introduced a new esxcli option called esxcli vsan resync. This allows us to have more control over resync monitoring/throttling at the host level of a vSAN node without having to rely on RVC or UI
Open a SSH to the ESXi server in question and execute
To validate current value
esxcli vsan resync throttle get > Get information about vSAN resync throttling
To modify current value
esxcli vsan resync throttle set -level (Set vSAN resync throttle level in Mbps (integer in the range 0-512, 0 means no throttling) (required))
Example output
esxcli vsan resync throttle set --level <0-512mb>
Note: These changes are applicable per host and not per cluster as in previous builds. No reboot is required for the changes to take effect
If the resync process is extremely slow, it is possible that bandwidth for resync traffic is being reduced due to resync throttling or heavy VM I/O on the system. Resync speed can be increased by reducing VM I/O and tuning throttling appropriately to balance VM I/O and Resync traffic. The other primary cause of slowness during resync operation is disk bottlenecking.
If a resync operation is causing a performance impact on the VM's in the cluster and throttling is disabled (as it is by default), the next step is to collect a performance data sample with Verbose and Network diagnostic mode via Perf Services for versions 6.7 and higher, and analyze the data to determine where the throughput bottleneck or latency is being introduced. More information on this process can be found in the below documentation.