VMs on vSAN datastore are experiencing performance issue and ESXi showed Memory/SSD/Component Congestion.
# vobd.log
YYYY-MM-DDThh:mm:ss.###Z cpu21:2099098)LSOM: LSOMThrowAsyncCongestionVOB:442: LSOM MemCong in ########-####-####-####-############ Congestion State: Exceeded. Congestion Threshold: 200 Current Congestion: 255.
YYYY-MM-DDThh:mm:ss.###Z cpu21:2099098)LSOM: LSOMThrowAsyncCongestionVOB:442: LSOM SSDCong in ########-####-####-####-############ Congestion State: Exceeded. Congestion Threshold: 200 Current Congestion: 255.
# vsish -e cat /vmkModules/lsom/disks/###/info
:::
memCongestion:243 ###<- !!!
slabCongestion:0
ssdCongestion:0
iopsCongestion:0
logCongestion:0
compCongestion:9 ###<- !!!
mdCongestion:0
:::
We could also see the high number of elements in commit tables.
# vsish -e ls /vmkModules/lsom/disks/ 2>/dev/null | while read d ; do echo -n ${d/\//} ; vsish -e get /vmkModules/lsom/disks/${d}WBQStats | grep "Number of elements in commit tables" ; done | grep -v ":0$"
########-####-####-####-############ Number of elements in commit tables:2161126 (>100k)
VMware vSAN 7.X
This issue is resolved in:
Workaround:
The impacted Disk-Groups commit-table entries (and thus also the Memory congestion) can be cleared by unmounting and mounting the impacted Disk-Group.
If unmount and mount of the Disk-Groups via the vSphere UI or via CLI is not possible then rebooting the node with congested Disk-Group(s) will also automatically unmount and mount the Disk-Groups as part of node restart process, Maintenance Mode with Ensure Accessibility option may not be possible depending on the severity of the issue.