vSAN IO operations buildup leading to LogCongestion
search cancel

vSAN IO operations buildup leading to LogCongestion

book

Article ID: 318127

calendar_today

Updated On:

Products

VMware vSAN

Issue/Introduction

This KB addresses a rare condition causing vSAN IO operations to buildup on the cache devices and not be processed which eventually leads to LogCongestion and performance degradation.

Symptoms:
Running any vSAN OSA version prior to 7.0U3l or 8.0U1 with UNMAP enabled and one or all of the following symptoms and/or conditions are present in the cluster :
  • Significant performance degradation on the vSAN cluster
  • One or more disk group(s) experiencing log congestion
  • You may run the following commands to verify the current vSAN UNMAP settings -
vSAN GuestUnmap:
#esxcfg-advcfg -g /VSAN/GuestUnmap
Value of GuestUnmap is 1 -> a value of ‘1’ means it is enabled while ‘0’ indicates it is disabled.

vSAN unmapFairness:
#esxcfg-advcfg -g /LSOM/unmapFairness
Value of unmapFairness is 1 -> a value of ‘1’ means it is enabled which is the default configuration after vSAN 7.0U1.

If you suspect your environment has encountered this condition, open a service request referencing this KB article so VMware vSAN GS can provide additional triage using an internal script that measures different levels of consumption at the cache device layer.

Environment

VMware vSAN 8.0.x
VMware vSAN 7.0.x

Cause

Upon investigation, vSAN engineering discovered a rare condition regarding the buildup of I/O operations caused by a logic issue blocking the processing of SCSI UNMAP intents and any other IO operation also waiting in queue to be processed. Because of the buildup and by design, vSAN applies back-pressure thus increasing the overall latency on the affected diskgroup(s).

Resolution

Upgrade vCenter/ESXi to versions 7.0U3l and 8.0U1 or higher.


Workaround:
Recreate any affected disk group(s) while making sure vSAN object accessibility is preserved.

Additional Information

vSAN high LLOG consumption leading to LogCongestion (88832)

Impact/Risks:
This issue's impact can manifest in high latency levels on the affected diskgroup(s) which could also affect the operations of VMs in the vSAN cluster.