vSAN Cluster Storage Usage Spikes Above 90% in Recent Hours

Products

VMware vSAN

Issue/Introduction

Symptoms:

Recent vSAN Disk failure triggered massive resync
vSAN Disk capacity is above 90% for most vSAN disks.

Validation Steps :

Connect putty to one of the vSAN node and run the below script to get the current vSAN disk utilization reports for all vSAN nodes in the Cluster.

cmmds-tool find -t HOSTNAME -f json | egrep "uuid|hostname" | sed -e 's/\"content\"://g' | awk '{print $2}' | sed -e 's/[\",\},\,]//
g' | xargs -n 2 | while read hostuuid hostname; do echo -e "\n\nHost Name: $hostname::: Host UUID: $hostuuid\n Disk Name\t\t| Disk UUID\t\t| Disk Usage |
Disk Capacity | Usage Percentage" ; cmmds-tool find -f python -t DISK -o $hostuuid | grep uuid | cut -c 13-48 | while read diskuuid;do cmmds-tool find -f js
on -t DISK -o $hostuuid -u $diskuuid| egrep "uuid|content" | sed -e 's/\"content\":|\\"uuid\"://g' | sed -e 's/[\",\},\]//g' | awk '{printf $0}' | sed -e 's/
},/\n/g'| awk '{print $37 " " $5 " " $45}'| while read disknaa diskcap maxcomp; do diskcapused=$(cmmds-tool find -f json -t DISK_STATUS -u $diskuuid | grep c
ontent |sed -e 's/[\",\},\]//g' | awk '{print $3}'); diskperc=$(echo "$diskcapused $diskcap" | awk '{print $1/$2*100}'); if [ "$maxcomp" != 0 ]; then echo -e
n " $disknaa\t| $diskuuid\t| $diskcapused\t | $diskcap\t | $diskperc%\n"; fi;done;done;done;

Example :

[root@test:/tmp] cmmds-tool find -t HOSTNAME -f json | egrep "uuid|hostname" | sed -e 's/\"content\"://g' | awk '{print $2}' | sed -e 's/[\",\},\,]//
g' | xargs -n 2 | while read hostuuid hostname; do echo -e "\n\nHost Name: $hostname::: Host UUID: $hostuuid\n Disk Name\t\t| Disk UUID\t\t| Disk Usage |
Disk Capacity | Usage Percentage" ; cmmds-tool find -f python -t DISK -o $hostuuid | grep uuid | cut -c 13-48 | while read diskuuid;do cmmds-tool find -f js
on -t DISK -o $hostuuid -u $diskuuid| egrep "uuid|content" | sed -e 's/\"content\":|\\"uuid\"://g' | sed -e 's/[\",\},\]//g' | awk '{printf $0}' | sed -e 's/
},/\n/g'| awk '{print $37 " " $5 " " $45}'| while read disknaa diskcap maxcomp; do diskcapused=$(cmmds-tool find -f json -t DISK_STATUS -u $diskuuid | grep c
ontent |sed -e 's/[\",\},\]//g' | awk '{print $3}'); diskperc=$(echo "$diskcapused $diskcap" | awk '{print $1/$2*100}'); if [ "$maxcomp" != 0 ]; then echo -e
n " $disknaa\t| $diskuuid\t| $diskcapused\t | $diskcap\t | $diskperc%\n"; fi;done;done;done;

Host Name: test.com::: Host UUID: xxxxxxx-xxxxxx-xxxx-xxxx-xxxxxxxxxxxx
Disk Name | Disk UUID | Disk Usage | Disk Capacity | Usage Percentage
naa.xxxxxxxxxxxx:2 | xxxxxxx-xxxxxx-xxxx-xxxx-xxxxxxxxxxxx | 3136019080080 | 3637540651008 | 86.2126%
naa.xxxxxxxxxxxx:2 | xxxxxxx-xxxxxx-xxxx-xxxx-xxxxxxxxxxxx | 3136019080080 | 3637540651008 | 86.2126%
naa.xxxxxxxxxxxx:2 | xxxxxxx-xxxxxx-xxxx-xxxx-xxxxxxxxxxxx | 3275191187629 | 3637540651008 | 90.0386%
naa.xxxxxxxxxxxx:2 | xxxxxxx-xxxxxx-xxxx-xxxx-xxxxxxxxxxxx | 3136019080080 | 3637540651008 | 86.2126%
naa.xxxxxxxxxxxx:2 | xxxxxxx-xxxxxx-xxxx-xxxx-xxxxxxxxxxxx | 3275191187629 | 3637540651008 | 90.0386%
naa.xxxxxxxxxxxx:2 | xxxxxxx-xxxxxx-xxxx-xxxx-xxxxxxxxxxxx | 3275191187629 | 3637540651008 | 90.0386%

Connect Putty to one of the vSAN node and run the 'localcli vsan debug resync summary get' command to check the current vSAN resync status.

Example :

[root@test:/tmp] localcli vsan debug resync summary get;

ResyncSummary:
Total Number Of Resyncing Objects: 38
Total Bytes Left To Resync: 4661841221120
Total GB Left To Resync: 4341.68
To monitor the vSAN resync via vSAN Skyline health refer: 'Monitor the Resynchronization Tasks in the vSAN Cluster'

Environment

VMware vSAN 7.x
VMware vSAN 8.x

Cause

The vSAN sudden space spike was due to an environmental issue with the backup server which was creating more snapshot on the vSAN VMs.

Resolution

Please reach your backup software vendor to investigate why the backup software is creating high snapshot creation on the vSAN datastore.
When there is a vSAN disk full situation refer to the fix mentioned in the following KB # to reclaim the storage space in vSAN Datastore.
- How to deal with a full vSAN Datastore
- vSAN Health Service - Capacity utilization - Disk space