Symptoms:
Following error in the cluster in vSphere UI:
Checking the EAM logs, we found the below errors:
YYYY-MM-DDTHH:MM:SS | ERROR | cluster-agent-3 | AuditedJob.java | 106 | JOB FAILED: [#1074974798] InstallClusterAgentJob(ClusterAgent(ID: 'Agent:c516a731-fcc0-459d-a286-fe2b5f48a590:null'))
java.lang.IllegalStateException: Duplicate key VirtualMachine:vm-621463
at java.util.stream.Collectors.lambda$throwingMerger$0(Collectors.java:133) ~[?:1.8.0_291]
at java.util.HashMap.merge(HashMap.java:1254) ~[?:1.8.0_291]
at java.util.stream.Collectors.lambda$toMap$58(Collectors.java:1320) ~[?:1.8.0_291]
at java.util.stream.ReduceOps$3ReducingSink.accept(ReduceOps.java:169) ~[?:1.8.0_291]
at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175) ~[?:1.8.0_291]
at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1384) ~[?:1.8.0_291]
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482) ~[?:1.8.0_291]
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472) ~[?:1.8.0_291]
at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708) ~[?:1.8.0_291]
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) ~[?:1.8.0_291]
at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:566) ~[?:1.8.0_291]
at com.vmware.eam.agency.vm.impl.LoadAgentVMsJob.call(LoadAgentVMsJob.java:130) ~[eam-server.jar:?]
at com.vmware.eam.agency.vm.impl.LoadAgentVMsJob.call(LoadAgentVMsJob.java:47) ~[eam-server.jar:?]
at com.vmware.eam.async.impl.AuditedJob.call(AuditedJob.java:58) [eam-server.jar:?]
at com.vmware.eam.async.impl.FutureRunnable.run(FutureRunnable.java:55) [eam-server.jar:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_291]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_291]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_291]
YYYY-MM-DDTHH:MM:SS | WARN | cluster-agent-3 | VcEventManager.java | 422 | Failed to post agent status changed from yellow to red because the agent is not fully initialize
vCenter server 7.x
vCenter server 8.x
Currently there is no resolution.
Workaround:
Note: Ensure to take a snapshot or backup of the vCenter server before performing the following the steps.
1. Disable vCLS on the cluster using retreat mode:
2. Power off all the vCLS VMs on the cluster and then delete them from disk by right click the VM --> Delete from disk
3. Enable the vCLS back on the cluster by changing the value of the parameter: config.vcls.clusters.domain-c<number>.enabled to True
4. The vCLS VMs should be re-created automatically within few minutes
5. Repeat the steps for other clusters having the same symptoms
For more information about retreat mode: How to Disable vCLS on a Cluster via Retreat Mode (316514)
Impact/Risks:
DRS is not functioning.