Q: What does the Physical Disk Health - Disk Capacity check do?
This health check is only applicable on capacity tier drives. It does not apply to the cache devices. If this health status is not green/OK, it indicates that this disk is low on free disk space.
Q: What does it mean when it is in an error state?
If the free disk space on a physical disk is below 80% usage, a state of green (OK) health is displayed. If the usage is between 80% and 95%, health will be shown as yellow (warning) and if the physical disk usage is above 95% usage, a red (alert) health is displayed.
Q: How does one troubleshoot and fix the error state?
The first step is to ensure that all the storage is valid and that there are no missing capacity devices. If a capacity device fails, it will most likely entail a rebuild of components on the remaining disks in the cluster, possibly pushing disk usage above 80% on some devices. The Disk status can be checked using the vSphere Web Client. Ensure that the vSAN datastore capacity is what you expect it to be.
vSAN attempts to balance the space usage of disks when they reach 80%. If one disk has reached 80%, vSAN will automatically remediate the situation. If all the physical disks are using greater than 80% of their capacity, vSAN still tries to keep the amount of consumed capacity balanced. At this point, you should consider introducing additional capacity to the cluster. VMware recommends a slack space of somewhere in the region of 30%.
Rebalancing activity can be monitored using the vSphere Web Client, and can also be monitored via the Ruby vSphere Console (RVC) using the
vsan.resync_dashboard command.
If one physical disk is consistently showing close to full, while other disks are not, this could indicate an issue with the vSAN balancing system. At this point, VMware Support should be engaged to figure out why balancing is not occurring automatically. For more information, see
How to file a Support Request in Customer Connect and via Cloud Services Portal.When a physical disk gets close to being full, virtual machines that use this disk and that are thin provisioned (object space reservation < 100%) and which need additional space to service I/O, will be stunned. In this case, a question is posted to the administrator of the virtual machine. The user has the choice to either cancel or to retry the I/O. If some disk space has become available in the meantime, a retry will resume the virtual machine and I/O will succeed.
This behavior is not unique to vSAN. This is the same behavior on traditional VMFS and NFS datastores when they become full.