vSAN nodes experience LSOM Memory congestion due to high "Number of elements in commit tables" (vSAN 6.7 U3/vSAN 7.0 U1)

Products

VMware vSAN

Issue/Introduction

This KB addresses LSOM memory congestion issues on specific builds from vSAN 6.7 Update 3 P04 (17167734) and before P05 (17700523) or from 7.0 U1c (17325551) before 7.0 U2 (17630552)

Symptoms:

When running vSAN 6.7 builds from Update 3 P04 (17167734) and before P05 (17700523) or from 7.0 U1c (17325551) before 7.0 U2 (17630552)

AND one of the following is occurring:

The number of elements in the commit tables" are more than 100k and do not decrease over a period of X hours (refer to script 2 below)

OR

One or more condition matches :

We may not be able to see the Files and folders on vSAN datastore when one or more vSAN cluster nodes experience LSOM memory congestion
Severe performance degradation on the cluster due to LSOM memory congestion
One or more node(s) exhibits high LSOM memory congestion
"Number of elements in the commit tables" are more than 100k ( refer to script 2 below )
Memory Congestion is propagated to all the nodes in the cluster.
You may see Stuck descriptor messages in vmkernel
All or some VMs may show as inaccessible in vCenter
All or multiple hosts may become unresponsive
Logs messages in vmkernel.log:
- LSOM: LSOM_ThrowCongestionVOB:3429: Throttled: Virtual SAN node "HOSTNAME" maximum Memory congestion reached.
- LSOM: LSOM_ThrowCongestionVOB:3429: Throttled: Virtual SAN node "HOSTNAME"maximum Memory congestion reached

Logs messages in vobd.log and vmkernel example:
- LSOM_ThrowAsyncCongestionVOB:1669: LSOM Memory Congestion State: Exceeded. Congestion Threshold: 200 Current Congestion: 204.

You may run the below script on the ESXI hosts to verify the LSOM memory congestion

Script 1 :

while true; do echo "================================================"; date; for ssd in $(localcli vsan storage list |grep "Group UUID"|awk '{print $5}'|sort -u);do echo $ssd;vsish -e get /vmkModules/lsom/disks/$ssd/info|grep Congestion;done; for ssd in $(localcli vsan storage list |grep "Group UUID"|awk '{print $5}'|sort -u);do llogTotal=$(vsish -e get /vmkModules/lsom/disks/$ssd/info|grep "Log space consumed by LLOG"|awk -F \: '{print $2}');plogTotal=$(vsish -e get /vmkModules/lsom/disks/$ssd/info|grep "Log space consumed by PLOG"|awk -F \: '{print $2}');llogGib=$(echo $llogTotal |awk '{print $1 / 1073741824}');plogGib=$(echo $plogTotal |awk '{print $1 / 1073741824}');allGibTotal=$(expr $llogTotal \+ $plogTotal|awk '{print $1 / 1073741824}');echo $ssd;echo " LLOG consumption: $llogGib";echo " PLOG consumption: $plogGib";echo " Total log consumption: $allGibTotal";done;sleep 30; done ;

Sample output from script-1:

Fri Feb 12 06:40:51 UTC 2021

529dd4dc--xxxx-xxxx-xxxx-xxxxxxxxxxxx
   memCongestion:0 >> This value is higher than 0 ( ranger 0-250 )
   slabCongestion:0
   ssdCongestion:0
   iopsCongestion:0
   logCongestion:0
   compCongestion:0
   memCongestionLocalMax:0
   slabCongestionLocalMax:0
   ssdCongestionLocalMax:0
   iopsCongestionLocalMax:0
   logCongestionLocalMax:0
   compCongestionLocalMax:0
529dd4dc-xxxx-xxxx-xxxx-xxxxxxxxxxxxxx
    LLOG consumption: 0.270882
    PLOG consumption: 0.632553
    Total log consumption: 0.903435

Script 2:

vsish -e ls /vmkModules/lsom/disks/ 2>/dev/null | while read d ; do echo -n ${d/\//} ; vsish -e get /vmkModules/lsom/disks/${d}WBQStats | grep "Number of elements in commit tables" ; done | grep -v ":0$"

Sample output for two DiskGroups on a host (please verify that lines returned match all cache disks, and you may ignore any capacity disks that may be listed):

52f395f3-03fd-f005-bf02-40287362403b/ Number of elements in commit tables:300891
526709f4-8790-8a91-2151-a491e2d3aec5/ Number of elements in commit tables:289371

Following is a script that runs in a loop till the Admin presses CTRL+C to check PLOG LLOG congestion on the DG level of the host and LLOG/PLOG Elevator Data level congestion.

for i in $(seq 1 20000) ;
do date ;
echo -e "======== \e[0m"  ;
for ssd in $(localcli vsan storage list |grep "Group UUID"|awk '{print $5}'|sort -u);do 
llogTotal=$(vsish -e get /vmkModules/lsom/disks/$ssd/info|grep "Log space consumed by LLOG"|awk -F \: '{print $2}');
plogTotal=$(vsish -e get /vmkModules/lsom/disks/$ssd/info|grep "Log space consumed by PLOG"|awk -F \: '{print $2}');
llogGib=$(echo $llogTotal |awk '{print $1 / 1073741824}');plogGib=$(echo $plogTotal |awk '{print $1 / 1073741824}');
allGibTotal=$(expr $llogTotal \+ $plogTotal|awk '{print $1 / 1073741824}');echo $ssd;
echo -e "\e[1;33m     LLOG consumption: $llogGib";
echo -e "\e[1;33m     PLOG consumption: $plogGib";
echo -e "\e[1;33m     Total log consumption: $allGibTotal" ;done ;
sleep 10 ;
echo -e "======== \e[0m"  ;
date  ;
for ssd in $(localcli vsan storage list |grep "Group UUID"|awk '{print $5}'|sort -u);
do echo $ssd;vsish -e get /vmkModules/lsom/disks/$ssd/info|grep Congestion ;done ;
sleep 10 ;
echo -e "======== \e[0m"  ;
for ssd in $(localcli vsan storage list|grep "Group UUID" |sort -u|awk '{print $5}');
do echo $ssd;
consumptionTotal=$(vsish -e get /vmkModules/lsom/disks/$ssd/info|grep "space consumed by"|awk -F \: '{sum+=$2}END{printf("%.0f\n", sum);}');
devName=$(vsish -e get /vmkModules/plog/devices_by_uuid/$ssd/info|grep "Disk Name"|awk -F \: '{print $2}');
dgConsumption=$(vsish -e get /vmkModules/plog/devices/$devName/elevStats|grep "for diskgroup"|awk -F \: '{print $2}');
zeroTotal=`echo $(($dgConsumption - $consumptionTotal))`;isNeg=$(echo $zeroTotal|grep ^\-);if [[ "$isNeg" != "" ]];
then zeroTotal="NA";zeroTotalGib="NA";else zeroTotalGib=$(echo $zeroTotal|awk '{printf("%.2f\n", $1 / 1024 / 1024 / 1024)}');fi;
echo -e "\t  Total elevator data: $dgConsumption";echo -e "\t Total LLOG/PLOG data: $consumptionTotal";
echo -e "\t      Total zero data: $zeroTotal ($zeroTotalGib GiB)";done
sleep 20 ;
echo -e "######### \e[0m"  ;
done ;

Environment

VMware vSAN 6.7.x
VMware vSAN 7.0.x

Cause

Scrubber configuration values were modified in vSAN 6.7 P04 and vSAN 7.0 U1 P02 releases to scrub objects at a higher frequency. This results in persisting scrubber progress of each object more frequently than before. If there are idle objects in the cluster, then commit table entries for these objects created by the scrubber will accumulate at LSOM. Eventually, the accumulation will lead to LSOM memory congestion.

Idle objects in this context refer to objects which are unassociated / powered off VMs / replicated objects..etc.

Resolution

This specific cause of Mem Congestion is is seen on the following vSphere/vSAN releases :
ESXi 6.7 Update 3 P04 Build : 17167734
ESXi 6.7 Update 3 EP18 Build: 17499825
ESXi 7.0 Update 1d Build : 17551050
ESXi 7.0 Update 1c Build : 17325551

Read and follow workaround section carefully .

Additionally, high Mem Congestion has also been noted on the following builds for other underlying causes, which are resolved after upgrading to 6.7 P05, but no other resolution is available for:
ESXi 6.7 Update 3 EP15: Build 16316930
ESXi 6.7 Update 3 P03 Build: 16713306
ESXi 6.7 Update 3 EP16 Build: 16773714
ESXi 6.7 Update 3 EP17 Build: 17098360

VMware Engineering Team is aware of this issue and has released the fix in vSAN 6.7 P05 and vSAN 7.0 U2 GA.

Workaround:

NOTE : It is recommended to apply the following config changes, even if customer are not seeing LSOM memory congestion proactively.

Change scrubber frequency to once per year:

   # esxcfg-advcfg -s 1 /VSAN/ObjectScrubsPerYear

Disable scrubber persist timer:

   # esxcfg-advcfg -s 0 /VSAN/ObjectScrubPersistMin

To remediate all hosts which have already hit high memory congestion issue, it is recommended to query "Number of elements in commit tables" and perform following steps below

Run the following command on every host in the cluster to check the LSOM commit table consumption for each cache disk, higher than 100K.

# vsish -e ls /vmkModules/lsom/disks/ 2>/dev/null | while read d ; do echo -n ${d/\//} ; vsish -e get /vmkModules/lsom/disks/${d}WBQStats | grep "Number of elements in commit tables" ; done | grep -v ":0$"

Sample output for two DiskGroups on a host:

52f395f3-03fd-f005-bf02-40287362403b/   Number of elements in commit tables:300891
526709f4-8790-8a91-2151-a491e2d3aec5/   Number of elements in commit tables:289371

Once we identify the value for "Number of elements in commit tables" on all disk groups across hosts, perform following steps in the descending order (hosts/diskgroups having highest value to the lowest)
1. a. Put host in Maintenance Mode with Ensure accessibility (only if the host has to be rebooted)
  b. Alternatively unmount and remount disk groups using CLI/UI (Ensure Accesibility)
2. Follow either step 1 or 2 in a rolling fashion on all the nodes in the cluster (Descending order)
Once above task is performed against all the hosts, set the following config option against all the hosts in the cluster.
1. Change scrubber frequency to once per year:
```
   # esxcfg-advcfg -s 1 /VSAN/ObjectScrubsPerYear
```
2. Disable scrubber persist timer:
```
   # esxcfg-advcfg -s 0 /VSAN/ObjectScrubPersistMin
```

Additional Information

Impact/Risks:

Performance degradation due to high LSOM Memory congestion caused by high commit table entries.

Attachments

configure-dom-scrubber-frequency get_app