DRS functionality impacted by unhealthy state of the vSphere Cluster Services (vCLS)
search cancel

DRS functionality impacted by unhealthy state of the vSphere Cluster Services (vCLS)

book

Article ID: 326366

calendar_today

Updated On:

Products

VMware vCenter Server VMware vCenter Server 7.0 VMware vCenter Server 8.0

Issue/Introduction

Symptoms:

Following error in the cluster in vSphere UI:

  • DRS is not functioning on those clusters.
  • Checking the vCLS VMs on those clusters, we can see more than 3 vCLS VMs created on the same cluster.
  • Retreat mode doesn't affect the clusters as the vCLS VMs are not being removed or re-created.

Checking the EAM logs, we found the below errors:

YYYY-MM-DDTHH:MM:SS | ERROR | cluster-agent-3 | AuditedJob.java | 106 | JOB FAILED: [#1074974798] InstallClusterAgentJob(ClusterAgent(ID: 'Agent:c516a731-fcc0-459d-a286-fe2b5f48a590:null'))
java.lang.IllegalStateException: Duplicate key VirtualMachine:vm-621463
        at java.util.stream.Collectors.lambda$throwingMerger$0(Collectors.java:133) ~[?:1.8.0_291]
        at java.util.HashMap.merge(HashMap.java:1254) ~[?:1.8.0_291]
        at java.util.stream.Collectors.lambda$toMap$58(Collectors.java:1320) ~[?:1.8.0_291]
        at java.util.stream.ReduceOps$3ReducingSink.accept(ReduceOps.java:169) ~[?:1.8.0_291]
        at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175) ~[?:1.8.0_291]
        at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1384) ~[?:1.8.0_291]
        at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482) ~[?:1.8.0_291]
        at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472) ~[?:1.8.0_291]
        at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708) ~[?:1.8.0_291]
        at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) ~[?:1.8.0_291]
        at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:566) ~[?:1.8.0_291]
        at com.vmware.eam.agency.vm.impl.LoadAgentVMsJob.call(LoadAgentVMsJob.java:130) ~[eam-server.jar:?]
        at com.vmware.eam.agency.vm.impl.LoadAgentVMsJob.call(LoadAgentVMsJob.java:47) ~[eam-server.jar:?]
        at com.vmware.eam.async.impl.AuditedJob.call(AuditedJob.java:58) [eam-server.jar:?]
        at com.vmware.eam.async.impl.FutureRunnable.run(FutureRunnable.java:55) [eam-server.jar:?]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_291]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_291]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_291]
YYYY-MM-DDTHH:MM:SS |  WARN | cluster-agent-3 | VcEventManager.java | 422 | Failed to post agent status changed from yellow to red because the agent is not fully initialize

Environment

vCenter server 7.x

vCenter server 8.x

Cause

Stale vCLS VMs on the vCenter with Duplicate Keys.

Resolution

Currently there is no resolution.


Workaround:

Note: Ensure to take a snapshot or backup of the vCenter server before performing the following the steps.

1. Disable vCLS on the cluster using retreat mode:

    • Log in to the vSphere Client.
    • Navigate to the cluster on which vCLS should be disabled. Copy the cluster domain id from the URL of the browser. It should be similar to 'domain-c<number>',
    • Navigate to the vCenter Server and then to Configure tab.
    • Click on Advanced settings section and then on Edit settings button.
    • Add a new entry with name = config.vcls.clusters.domain-c<number>.enabled and value = False.
    • Click Save.

 

2. Power off all the vCLS VMs on the cluster and then delete them from disk by right click the VM --> Delete from disk

3. Enable the vCLS back on the cluster by changing the value of the parameter: config.vcls.clusters.domain-c<number>.enabled to True

4. The vCLS VMs should be re-created automatically within few minutes

5. Repeat the steps for other clusters having the same symptoms



Additional Information

For more information about retreat mode: How to Disable vCLS on a Cluster via Retreat Mode (316514)

Impact/Risks:
DRS is not functioning.