vSphere DRS functionality is impaired or stops working completely. vSphere Cluster Services (vCLS) VMs are missing from the cluster and fail to deploy. Toggling vCLS from "System Managed" to "Retreat Mode" and back to "System Managed" does not regenerate the vCLS VMs. Additionally, attempts to read ESXi host logs fail due to file locks held by other hosts.
VMware vCenter Server 8.0U3
VMware ESXi 8.0U3
Multiple ESXi hosts are configured to use the exact same directory on a shared datastore for their persistent scratch partition (ScratchConfig.ConfiguredScratchLocation). This misconfiguration causes file locking contention between the hosts. vCLS VM deployment depends heavily on the availability of the host's scratch partition. When file locks prevent access, the ESX Agent Manager (EAM) cannot provision the necessary vCLS VMs, which causes DRS to fail.
Reconfigure the persistent scratch partition on all affected ESXi hosts to ensure each uses a unique directory.
Create a unique directory for each ESXi host on the shared datastore (e.g., /vmfs/volumes/<DATASTORE_NAME>/scratch/<HOSTNAME>).
Log in to the vCenter Server using the vSphere Client.
Navigate to Hosts and Clusters and select the affected ESXi host.
Click Configure, then under the System menu, select Advanced System Settings.
Locate the setting ScratchConfig.ConfiguredScratchLocation.
Click Edit and enter the unique path created for this specific host.
Reboot the ESXi host for the configuration changes to take effect.
Repeat these steps for all hosts in the affected clusters.
Once the scratch locations are unique and hosts are rebooted, vCLS VMs will automatically deploy, restoring DRS functionality.
Creating a persistent scratch location for ESXi: https://knowledge.broadcom.com/external/article/317689/creating-a-persistent-scratch-location-f.html