Error: "Can't provision VM for ClusterAgent due to lack of suitable datastore..." and vCLS deployment fails continuously
search cancel

Error: "Can't provision VM for ClusterAgent due to lack of suitable datastore..." and vCLS deployment fails continuously

book

Article ID: 423594

calendar_today

Updated On:

Products

VMware vCenter Server

Issue/Introduction

You experience continuous event spam in the vSphere Client Event Viewer with "The object or item referred to could not be found" (ManagedObjectNotFound) errors. This occurs when ESX Agent Manager (EAM) repeatedly fails to deploy vSphere Cluster Services (vCLS) agent VMs to a cluster due to a datastore selection failure.

The /var/log/vmware/eam/eam.log shows the following error repeating:

JOB FAILED: DeployVmJob(ClusterAgent(ID: <agent-id>))
com.vmware.eam.job.DeployVmJob$DeployVmJobFailure: Can't provision VM for ClusterAgent due to lack of suitable datastore.

EAM retries the failed deployment continuously, generating an event with each attempt. This floods the Event Viewer and makes it difficult to monitor legitimate tasks and events. The issue can occur even when healthy shared datastores with adequate free space are available on the cluster, if EAM becomes stuck retrying a cached placement decision for a datastore that is no longer suitable (such as decommissioned or full storage).

Additional symptoms reported:

  • Event Viewer getting spammed with several error messages
  • Messages reference KB 398128 ("The object or item referred to could not be found")
  • Difficult to conduct normal management operations
  • Cannot easily check tasks and events

Environment

 

  • vCenter Server 7.0 or later with vSphere Cluster Services (vCLS) enabled

 

Cause

ESX Agent Manager (EAM) uses a datastore selection policy to determine where to deploy vSphere Cluster Services (vCLS) agent VMs. When EAM selects a datastore and attempts deployment, it caches that placement decision. If the deployment fails, EAM retries the same placement rather than re-evaluating which datastores are currently available.

This becomes a problem when the originally selected datastore becomes unsuitable—for example, if the datastore is decommissioned, enters an alert state, runs out of space, or becomes inaccessible. Even though other healthy shared datastores may be available on the cluster, EAM continues retrying deployment to the original (now unsuitable) datastore. Each failed retry generates an event, causing the continuous event spam visible in the vSphere Client.

The datastore selection policy setting (such as "useLeastUtilized") only applies during the initial selection. Once EAM has made a placement decision and entered a retry loop, it does not automatically re-evaluate available datastores.

Resolution

Step 1: Identify the failing EAM agency

  1. Open a browser and navigate to the EAM Managed Object Browser: https://<vcenter-fqdn>/eam/mob
  2. Log in with vCenter SSO credentials.
  3. Click EsxAgentManager.
  4. Expand the agency property to view all agency IDs.
  5. Click on each agency ID and examine the cluster property to find which agency is associated with the affected cluster.
  6. Once you identify the agency for the affected cluster, note the following properties:
    • solutionId – identifies what type of agent this is
    • status – current state (red indicates failure)
    • goalState – should be "enabled" if actively trying to deploy
  7. Click on config and note the systemId and agencyName values.
  8. Click on runtime to view the issue property for specific error details.

Step 2: Determine if this is a Supervisor cluster

This step is critical. The resolution differs significantly depending on whether the affected cluster has Workload Management (vSphere with Tanzu/Kubernetes) enabled.

From the vSphere Client:

  1. Navigate to Menu > Workload Management (vCenter 8) or Menu > Supervisor Management (vCenter 9).
  2. Check if any Supervisors are listed and whether the affected cluster is among them.

From the EAM MOB:

  • If the solutionId contains "wcp" or the systemId is related to Workload Management, this is a Supervisor cluster.
  • If systemId is "vCLS" and solutionId is a vpxd-extension, this is standard vSphere Cluster Services (not Supervisor).

If this is NOT a Supervisor cluster (standard vCLS)

Proceed with the following steps to reset vCLS:

  1. In the vSphere Client, navigate to the affected cluster in the Hosts and Clusters view.
  2. Select the cluster and go to Configure > vSphere Cluster Services > General.
  3. Change the vCLS mode to Retreat.
  4. Wait for the existing vCLS agent VMs to be removed from the cluster. Monitor the Recent Tasks pane for completion.
  5. Once retreat is complete, change the vCLS mode back to Enabled.
  6. Allow EAM to redeploy the vCLS agent VMs. This triggers a fresh datastore evaluation using currently available healthy shared storage.
  7. Navigate to Monitor > Events and verify the error spam has stopped.

Note: While vCLS is in Retreat mode, DRS fully automated migrations and certain vSphere HA functions are temporarily unavailable. Perform this procedure during a maintenance window or period when brief degradation of these features is acceptable.


If this IS a Supervisor cluster (Workload Management enabled)

WARNING: Do not use Retreat mode or destroy EAM agencies on Supervisor clusters. Supervisor control plane VMs are managed by EAM. Destroying these agencies deletes the control plane VMs, which can render the entire Kubernetes environment unrecoverable.

For Supervisor clusters experiencing datastore placement failures:

  1. Do not delete or destroy EAM agencies.
  2. Do not manually delete Supervisor control plane VMs.
  3. Identify and resolve the underlying datastore issue:
    • Verify the storage policy configured for the Supervisor (Workload Management > Supervisors > [Supervisor] > Configure > Storage)
    • Confirm that compatible datastores for the storage policy are accessible and have adequate free space
    • If a datastore has been decommissioned or is no longer available, follow the datastore migration procedure
  4. If the Supervisor control plane VMs must be migrated to different storage, follow KB 95945: Migrating VMs/Volumes across Datastores in vSphere with Tanzu / vSphere Kubernetes Supervisor
  5. For additional Supervisor troubleshooting guidance, see KB 323407: Troubleshooting vSphere Supervisor Control Plane VMs

Once the underlying datastore issue is resolved, EAM will automatically retry deployment and should succeed.

Additional Information

For more information on vSphere Cluster Services and datastore configuration, see Introduction to the vSphere Clustering Service (vCLS).

For more information on the ESX Agent Manager and agency management, see vSphere ESX Agent Manager.

Related articles: