VMs residing on vSAN fail to start with error no more space for virtual disk (VDI Environment)

search cancel

VMs residing on vSAN fail to start with error no more space for virtual disk (VDI Environment)

book

Article ID: 413885

calendar_today

Updated On:

Products

VMware vSAN

Issue/Introduction

When trying to power on VMs residing on vSAN the below error is received:
There is no more space for virtual disk VMName.vmdk. You might be able to continue this session by freeing disk space on the relevant volume, and clicking Retry.
The vSAN datastore is showing plenty of free space
VDI environment with high/busy recompose tasks consisting of writes/deletes
A host was placed in to maintenance mode with "Full Data Evacuation" during planned maintenance such as a cluster upgrade
Automatic rebalance is disabled
vSAN Health Service is reporting the host that was placed into maintenance mode with Full Data Evacuation is running out of Logical Capacity:
[root@esx6:~] esxcli vsan health cluster get -t "physdiskcapacity"
Disk capacity red
Disks with issues
Host Disk Physical used space Logical used space
--------------------------------------------------------------------------------------------------------------------------------------------------------
192.###.###.37 Local ATA Disk (naa.500a0##########5) green#21.55% (730.08GB of 3387.72GB) red#99.98% (16382.08GB of 16383.99GB)
192.###.###.37 Local ATA Disk (naa.500a0##########c) green#24.55% (831.95GB of 3387.72GB) yellow#89.75% (14705.29GB of 16383.99GB)
192.###.###.37 Local ATA Disk (naa.500a0##########7) green#21.55% (730.08GB of 3387.72GB) red#96.43% (15800.32GB of 16383.99GB)
/usr/lib/vmware/vsan/bin/clom-tool stats |grep PendingDeletes | awk '{print $30, $31}' shows a large amount of pending deletes

Environment

VMware vSAN OSA (All Version)

VDI environment

Cause

This was caused due to the host being placed into maintenance mode with "Full Data Evacuation" with automatic rebalance disabled so the disk groups on the host were empty. While VDI was processing it's recompose jobs vSAN was heavily using the host for placing components on these disks as they had the most free space.

As the LSOM elevator couldn't keep up de-staging data from cache tier to capacity tier and process pending deletes the logical space on the disks filled up.

Resolution

Don't use "Full Data Evacuation" for temporary maintenance task in the environment. "Full Data Evacuation" should only be used if removing a host/disk group from the environment. For temporary maintenance tasks in the environment use "Ensure Accessibility"

If the environment has already hit this issue open a case with vSAN Support for assistance

Feedback

thumb_up Yes

thumb_down No