vSAN memory or SSD congestion reached threshold limit
book
Article ID: 327050
calendar_today
Updated On:
Products
VMware vSAN
Issue/Introduction
Summary
You receive this alert when an ESXi host that is part of a vSAN cluster determines that the internal vSAN memory (LSOM) or Flash (SSD) device has exceeded the predefined congestion threshold.
Congestion in vSAN occurs when the I/O rate of the lower layers of the storage subsystem fails to keep up with the I/O rate of the higher layers.
Local Log Structured Object Management (LSOM) is an internal component of vSAN, that works at the physical disk level (both flash devices and magnetic disks). LSOM also handles the read caching and write buffering for the components.
SSD is a cache device for a vSAN disk group.
The LSOM memory congestion state and LSOM SSD congestion state occur when vSAN artificially introduces latencies in the virtual machines in order to slow down writes to the flash device layer or layers.
Impact
During an observed congestion period, higher virtual machine latencies occur.
Short periods of congestion might occur as vSAN uses a throttling mechanism to ensure that all layers run at the same I/O rate.
Smaller values for congestion are preferable, as higher value signifies latency. However, sustained congestion are not usual and in most cases, congestion should be close to zero.
Environment
VMware VSAN
Resolution
If virtual machines perform a high number of write operations, write buffers could fill up on flash cache devices. These buffers must be de-staged to magnetic disks in hybrid configurations. De-staging can only be performed at a rate at which the magnetic disks in a hybrid configuration can handle.
Other reasons for congestion could be related to:
Faulty hardware
Corrupted or incorrectly functioning drivers or firmware
The vSAN Health Check can be used to monitor vSAN Congestion. However, if you are experiencing congestion above the thresholds, you should open a Service Request with VMware Support as soon as possible.