In a vSphere environment using a Datastore Cluster with Storage DRS (SDRS) enabled and set to Fully Automated, the following symptoms may occur:
Datastores reach critical capacity (e.g., 99%) without triggering automatic Storage vMotion migrations.
Virtual Machines may become suspended due to a lack of available storage space.
Attempts to enter a Datastore into Maintenance Mode fail or stall after partial evacuation.
SDRS does not provide recommendations, or recommendations are present but not applied automatically.
vCenter Server 8.x
This issue is typically caused by a Logic Deadlock within the SDRS algorithm. The primary drivers for this deadlock include:
Affinity Rule Conflicts: The default setting "Keep VMDKs together" (Intra-VM Affinity) is enabled. If a VM's total size exceeds the available space on any single destination datastore (while respecting the 80% threshold), SDRS will not split the VM across multiple disks and thus will not move it.
Threshold Saturation: If multiple datastores in a cluster exceed the configured Space Utilization Threshold, SDRS cannot find a "valid" destination that wouldn't immediately violate the threshold rules upon receiving a new VM.
I/O Latency Guard: If "Enable I/O metric for SDRS recommendations" is active and the datastores are experiencing high latency (common during capacity crises), SDRS will block moves to prevent performance degradation.
Unmanaged Consumption: "Zombie files" or orphaned VMDKs consume physical space but are not part of the vCenter inventory. SDRS cannot move these files, which reduces the "mobile capacity" of the cluster.
To resolve the deadlock and restore automation, perform the following steps:
Adjust Runtime Thresholds:
Navigate to Datastore Cluster > Configure > Storage DRS > Edit.
Under Runtime Settings, temporarily increase the Space Threshold from 80% to 95%.
Under Advanced Options, change "Check imbalances every" from 8 hours to 4 hours to increase the frequency of re-evaluation.
Disable I/O Metrics (Temporary):
In Runtime Settings, uncheck "Enable I/O metric for SDRS recommendations" to ensure migrations are not blocked by storage latency during the cleanup phase.
Address Affinity Rules:
In Advanced Options, temporarily toggle OFF "Keep VMDKs together" if your backup software (e.g., Veeam) or environment allows for split-disk configurations. Note: Re-enable this after the crisis if required for backup consistency.
Refresh the Service:
Toggle the "Turn ON vSphere Storage DRS" switch OFF and then ON again to restart the SDRS scheduler.
Manual Cleanup:
Identify and delete "zombie" or orphaned files from the datastores to recover physical capacity that SDRS cannot manage.
Revert Settings:
Once the datastores are below the desired utilization (e.g., 80%), revert the thresholds and I/O metrics to their original production values.