"vSphere DRS functionality was impacted due to unhealthy state vSphere Cluster Services", vCLS virtual machines are not getting deployed after VCSA upgrade to 7.0
search cancel

"vSphere DRS functionality was impacted due to unhealthy state vSphere Cluster Services", vCLS virtual machines are not getting deployed after VCSA upgrade to 7.0

book

Article ID: 318191

calendar_today

Updated On: 02-04-2025

Products

VMware vCenter Server

Issue/Introduction

Symptoms:
  • After vCenter Server Appliance (VCSA) upgrade to 7.0 Update 1 or later, vSphere Cluster Service (vCLS) virtual machine(s) are not getting deployed.
  • You see a warning message in vSphere Client as "vSphere DRS functionality was impacted due to unhealthy state vSphere Cluster Services caused by the unavailability of vSphere Cluster Service VMs. vSphere Cluster Service VMs are required to maintain the health of vSphere DRS"
  • You may see the below errors/warnings:
  • Can’t provision VM for Cluster Agent due to lack of suitable datastore
  • Couldn’t acquire token due to: Signature validation failed
  • You may see the below error snippets in /var/log/vmware/eam/eam.log 
YYYY-MM-DDTHH:MM:SS |  INFO | cluster-agent-1 | AgentBase.java | 229 | [checkGoal:ClusterAgent(ID: 'Agent:5a001f25-7de6-483a-8c0f-1eb5199515dd:null')] task in progress.
YYYY-MM-DDTHH:MM:SS |  INFO | cluster-agent-1 | VcEventManager.java | 792 | [EventIndex: 141046] Posting event.
YYYY-MM-DDTHH:MM:SS | ERROR | cluster-agent-1 | AuditedJob.java | 106 | JOB FAILED: [#1878814658] DeployVmJob(ClusterAgent(ID: 'Agent:5a001f25-7de6-483a-8c0f-1eb5199515dd:null'))
com.vmware.eam.job.DeployVmJob$DeployVmJobFailure: Can't provision VM for ClusterAgent(ID: 'Agent:5a001f25-7de6-483a-8c0f-1eb5199515dd:null') due to lack of suitable datastore.

YYYY-MM-DDTHH:MM:SS |  INFO | cluster-agent-1 | AgentBase.java | 229 | [checkGoal:ClusterAgent(ID: 'Agent:5a001f25-7de6-483a-8c0f-1eb5199515dd:null')] task in progress.
YYYY-MM-DDTHH:MM:SS |  INFO | cluster-agent-1 | VcEventManager.java | 792 | [EventIndex: 141046] Posting event.
YYYY-MM-DDTHH:MM:SS | ERROR | cluster-agent-1 | AuditedJob.java | 106 | JOB FAILED: [#1878814658] DeployVmJob(ClusterAgent(ID: 'Agent:5a001f25-7de6-483a-8c0f-1eb5199515dd:null'))
com.vmware.eam.job.DeployVmJob$DeployVmJobFailure: Can't provision VM for ClusterAgent(ID: 'Agent:5a001f25-7de6-483a-8c0f-1eb5199515dd:null') due to lack of suitable datastore.
 
YYYY-MM-DDTHH:MM:SS |  INFO | sts-0 | Workflow.java | 121 | [CreateSAMLToken:577f77cb515aed10] FAILED
com.vmware.eam.sso.exception.TokenNotAcquired: Couldn't acquire token due to: Signature validation failed
Caused by: com.vmware.vim.sso.client.exception.MalformedTokenException: Signature validation failed


Note:The preceding log excerpts are only examples. Date,time and environmental variables may vary depending on your environment


Environment

VMware vCenter Server 7.0.x

Cause

As part of the vCLS deployment workflow, EAM Service will identify the suitable datastore to place the vCLS VMs. This workflow was failing due to EAM Service unable to validate the STS Certificate in the token.

Resolution

This issue is resolved in VMware vCenter Server 7.0 Update 3, see Download Broadcom products and software
Workaround:
To workaround this issue, reset the STS Certificate following the KB >> https://knowledge.broadcom.com/external/article/316619 

  1. vCLS VM(s) should get deployed successfully



Additional Information

"Signing certificate is not valid" error in VCSA 6.5.x/6.7.x and vCenter Server 7.0.x
For more information on vCLS, see vSphere Cluster Services (vCLS) in vSphere 7.0 Update 1
For more information on STS certificates, see Security Token Service STS

Impact/Risks:

Warning:

This script interacts with the VMDIR's database. Take an offline snapshot concurrently for all vCenter Servers in the SSO domain before running the script. Failing to do so may result in an unrecoverable error and require redeploying vCenter Server.

Once the script is complete, restart services for all vCenters in the site domain. As such, the below script fix will require outages for all vCenters in the site domain.