It's important to monitor the vSAN cluster to ensure it doesn't become too full. There are Health Service alerts in place to help keep track of capacity utilization. Please see: vSAN Health Service - Capacity utilization - Disk space and vSAN Health Service - Physical Disk Health - Disk Capacity for more information.
If a vSAN datastore becomes too full, it can cause issues such as resyncs being stuck and certain management tasks to timeout and get stuck.
Other symptoms can include:
VMs not running despite having free space on vSAN Datastore.
vSAN Free Capacity recommendations: Understanding reserved capacity concepts in vSAN
VMware vSAN (vSAN OSA All Versions)
A VM freeze or hang can occur when one or more individual physical disks within the vSAN cluster reach a 'disk full' situation. Because vSAN distributes virtual machine components across specific physical drives based on storage policies, localized capacity exhaustion on a single disk will block I/O operations for any associated objects. This results in an operational failure even if the aggregate vSAN Datastore still reports available free space.
Example :
Connect to one of the vSAN nodes and run the below command to check the vSAN disk utlization of all vSAN nodes.
cmmds-tool find -t HOSTNAME -f json | egrep "uuid|hostname" | sed -e 's/\"content\"://g' | awk '{print $2}' | sed -e 's/[\",\},\,]//g' | xargs -n 2 | while read hostuuid hostname; do echo -e "\n\nHost Name: $hostname::: Host UUID: $hostuuid\n Disk Name\t\t| Disk UUID\t\t| Disk Usage | Disk Capacity | Usage Percentage" ; cmmds-tool find -f python -t DISK -o $hostuuid | grep uuid | cut -c 13-48 | while read diskuuid;do cmmds-tool find -f json -t DISK -o $hostuuid -u $diskuuid| egrep "uuid|content" | sed -e 's/\"content\":|\\"uuid\"://g' | sed -e 's/[\",\},\]//g' | awk '{printf $0}' | sed -e 's/},/\n/g'| awk '{print $37 " " $5 " " $45}'| while read disknaa diskcap maxcomp; do diskcapused=$(cmmds-tool find -f json -t DISK_STATUS -u $diskuuid | grep content |sed -e 's/[\",\},\]//g' | awk '{print $3}'); diskperc=$(echo "$diskcapused $diskcap" | awk '{print $1/$2*100}'); if [ "$maxcomp" != 0 ]; then echo -en " $disknaa\t| $diskuuid\t| $diskcapused\t | $diskcap\t | $diskperc%\n"; fi;done;done;done;
Host Name: Test.com::: Host UUID: 5d85cedf-xxxx-xxxx-559b-xxxxxxxxxxxx Disk Name | Disk UUID | Disk Usage | Disk Capacity | Usage Percentage naa.xxxxxxxxxxxef:2 | 52836663-xxxx-xxxx-xxxx-xxxxxxxxxxxx | 1045918363156 | 1800350466048 | 58.0953% naa.xxxxxxxxxxx2f:2 | 52cf6a04-xxxx-xxxx-xxxx-xxxxxxxxxxxx | 1188948323860 | 1800350466048 | 66.0398% naa.xxxxxxxxxxx6f:2 | 52be73be-xxxx-xxxx-xxxx-xxxxxxxxxxxx | 779064159764 | 1800350466048 | 43.2729% naa.xxxxxxxxxxx5d:2 | 52438c06-xxxx-xxxx-xxxx-xxxxxxxxxxxx | 1206224662036 | 1800350466048 | 66.9994% naa.xxxxxxxxxxx0b:2 | 525d3b7a-xxxx-xxxx-xxxx-xxxxxxxxxxxx | 1111445974548 | 1800350466048 | 61.735% naa.xxxxxxxxxxx07:2 | 526a28ac-xxxx-xxxx-xxxx-xxxxxxxxxxxx | 976691375636 | 1800350466048 | 54.2501% naa.xxxxxxxxxxx43:2 | 52d3cf65-xxxx-xxxx-xxxx-xxxxxxxxxxxx | 1800343629332 | 1800350466048 | 99.9996% naa.xxxxxxxxxxxfb:2 | 52444581-xxxx-xxxx-xxxx-xxxxxxxxxxxx | 1461741661716 | 1800350466048 | 81.1921%
The only recommended solutions are the following:
If this is not possible, there are some methods that can be done to clear space for management tasks such as adding more disks.
This can also be used for certain VMs that have fault tolerance built into their application by nature, such as having primary and secondary VMs where one takes over if the other fails.
Please see: How vSAN handles Policy Changes between RAID1 Mirroring and RAID 5/6 Storage Policies.Verify storage policy in use, in rare cases some objects previously migrated to vSAN datastore may have Storage Rule 'proportionalCapacity = 100' (thick) incorrectly assigned.
To identify such objects user should run the following commands:
cmmds-tool find -f python | grep 'proportionalCapacity\\\": 100' -B9 | grep uuid | cut -d "\"" -f4 >> /tmp/uuidlist.txt
(Note: this command creates a file in /tmp/uuidlist.txt with all the objects with 'proportionalCapacity = 100' rule.)
for i in $(cat /tmp/uuidlist.txt); do echo "*********************";echo; /usr/lib/vmware/osfs/bin/
(Note: this command outputs 'UUID <-> path' pairs based on previously created /tmp/uuidlist.txt file)
Based on the friendly names ('Object path') in the output user could determine list of good candidates (UUID) for conversion to thin, once completed (Re)apply Storage Policy in the UI by assigning Storage Policy with same characteristics object(s) already has (Failures to Tolerate, etc.) and with 'proportionalCapacity' rule set to '0'.