vSAN memory or SSD congestion reached threshold limit
search cancel

vSAN memory or SSD congestion reached threshold limit

book

Article ID: 327050

calendar_today

Updated On:

Products

VMware vSAN

Issue/Introduction

Summary

You receive this alert when an ESXi host that is part of a vSAN cluster determines that the internal vSAN memory (LSOM) or Flash (SSD) device has exceeded the predefined congestion threshold.

Examples of this alert are:

LSOM Memory Congestion State: Exceeded. Congestion Threshold: 200 Current Congestion: 201.
LSOM SSD Congestion State: Exceeded. Congestion Threshold: 200 Current Congestion: 201.


Congestion in vSAN occurs when the I/O rate of the lower layers of the storage subsystem fails to keep up with the I/O rate of the higher layers.

Local Log Structured Object Management (LSOM) is an internal component of vSAN, that works at the physical disk level (both flash devices and magnetic disks). LSOM also handles the read caching and write buffering for the components.

SSD is a cache device for a vSAN disk group.

The LSOM memory congestion state and LSOM SSD congestion state occur when vSAN artificially introduces latencies in the virtual machines in order to slow down writes to the flash device layer or layers.

 

Impact

During an observed congestion period, higher virtual machine latencies occur.

Short periods of congestion might occur as vSAN uses a throttling mechanism to ensure that all layers run at the same I/O rate.

Smaller values for congestion are preferable, as higher value signifies latency. However, sustained congestion are not usual and in most cases, congestion should be close to zero.

Environment

VMware VSAN

Resolution


If virtual machines perform a high number of write operations, write buffers could fill up on flash cache devices. These buffers must be de-staged to magnetic disks in hybrid configurations. De-staging can only be performed at a rate at which the magnetic disks in a hybrid configuration can handle.

Other reasons for congestion could be related to:
  • Faulty hardware
  • Corrupted or incorrectly functioning drivers or firmware
  • Insufficient I/O controller queue depths
  • Under specified vSAN deployments
For more information, see the:
SSD log buildup can cause poor performance in a VMware vSAN Cluster (326870)

Note:
After the congestion levels fall back below the threshold, ESXi generates these types of event:

LSOM Memory Congestion State: Normal. congestion Threshold: 200 Current Congestion: 190.
LSOM SSD Congestion State: Normal. Congestion Threshold: {3} Current Congestion: 190.


The vSAN Health Check can be used to monitor vSAN Congestion. However, if you are experiencing congestion above the thresholds, you should open a Service Request with VMware Support as soon as possible.