All VMs in Cluster Offline - vSAN unavailability/congestion errors present
search cancel

All VMs in Cluster Offline - vSAN unavailability/congestion errors present

book

Article ID: 397123

calendar_today

Updated On:

Products

VMware vSAN 8.x

Issue/Introduction

All VMs in Cluster Offline - vSAN unavailability/congestion errors present

SSD congestion leading to vSAN latency vSAN Health Service - Physical Disk Health – Congestion

Example:

for ssd in $(localcli vsan storage list |grep "Group UUID"|awk '{print $5}'|sort -u);do echo $ssd;vsish -e get /vmkModules/lsom/disks/$ssd/info|grep Congestion;done

520429a6-337d-8edb-711e-############ 
   memCongestion:0
   slabCongestion:0
   ssdCongestion:252 <----------maxed out
   iopsCongestion:0
   logCongestion:0
   compCongestion:0
   maxDeleteCongestion:0
   mdDeleteCongestion:0
   memCongestionLocalMax:0
   slabCongestionLocalMax:0
   ssdCongestionLocalMax:252 <----------maxed out
   iopsCongestionLocalMax:0
   logCongestionLocalMax:0
   compCongestionLocalMax:0
   mdDeleteCongestionLocalMax:0

 

Environment

VMware vSAN OSA (All Versions)

Cause

SSD Congestion typically arises when the write cache or "cache tier" on a disk group becomes overloaded, unable to handle the incoming I/O rate. 
This can happen due to a variety of reasons, including insufficient cache size, a large active working set of writes, or issues with the LSOM (Log Storage and Object Management) layer. 

Resolution

Due to the nature of Congestion and possible inaccessible objects, please open a case with Broadcom Support for further assistance to prevent any permanent changes that could lead to Data Loss.

Additional Information