A VM will crash when trying to write to a disk or fail to power on saying "no space left on device
" when there is clearly still plenty of available space on the vSAN datastore. In the vmkernel log, you may see messages similar to the following.2020-01-30T23:49:23.758Z cpu0:2097446)FS3DM: 2868: status No space left on device zeroing 1 extents (1048576 each)
2020-01-30T23:49:23.758Z cpu0:2097446)FS3J: 3104: Aborting txn (0x430dbf9c1e70) callerID: 0xc1d00006 due to failure pre-committing: No space left on device
2020-01-30T23:49:54.543Z cpu9:2097446)FS3DM: 2868: status No space left on device zeroing 1 extents (1048576 each)
2020-01-30T23:49:54.543Z cpu9:2097446)FS3J: 3104: Aborting txn (0x430dbf9c1e70) callerID: 0xc1d00006 due to failure pre-committing: No space left on device
2020-01-30T23:50:26.109Z cpu9:2097446)FS3DM: 2868: status No space left on device zeroing 1 extents (1048576 each)
2020-01-30T23:50:26.109Z cpu9:2097446)FS3J: 3104: Aborting txn (0x430dbf9c1e70) callerID: 0xc1d00006 due to failure pre-committing: No space left on device
This is caused if the vmdk resides on the disk that is at or above 99%. When there is a vSAN disk at or above 99% utilization, it can cause VM's to not be able to write to VMDK's even though there is still ample space available on the vSAN datastore.
To work around this, identify the disk that is full using the following command in RVC
.> vsan.disks_state .
Alternatively, from an ESXi host 6.5U1 or higher:# esxcli vsan debug disk overview
Once the disk above 99% has been identified,
Put the host into maintenance mode with Ensure Accessibility
Remove the disk from the Disk Group with No Action if Dedup
is not enabled
After the disk has been removed, re-add the disk to the original Disk Group.
If Dedup
is enabled, destroy and recreate the disk group.
Remove Disk Groups or Devices from vSAN
Recreate a Disk Group
This will force the objects to rebuild on other disks in the cluster. After this, the VM should be able to power on and write to the previously affected VMDK.