OpenSearch node count only 2 instead 3
search cancel

OpenSearch node count only 2 instead 3

book

Article ID: 378840

calendar_today

Updated On:

Products

VMware Aria Suite

Issue/Introduction

Symptoms:

  • The VMware Identity Manager System Diagnostics Dashboard shows OpenSearch cluster nodes count 2 instead of 3
  • within OpenSearch logs /opt/vmware/opensearch/logs/horizon.log the below message can be seen:

The VMware Identity Manager System Diagnostics Dashboard shows OpenSearch cluster nodes count 2 instead of 3
within OpenSearch logs /opt/vmware/opensearch/logs/horizon.log the below message can be seen:
[..TIMESTAMP..][WARN ][o.o.c.c.Coordinator] [vidm2.hostname] failed to validate incoming join request from node [{vidm1.hostname}{Random22characters-1}{Random22characters-2}{xxx.xxx.x.x}{xxx.xxx.x.x:9300}{dimr}{shard_indexing_pressure_enabled=true}]
org.opensearch.transport.RemoteTransportException: [vidm1.hostname][xxx.xxx.x.x:9300][internal:cluster/coordination/join/validate]
Caused by: org.opensearch.cluster.coordination.CoordinationStateRejectedException: join validation on cluster state with a different cluster uuid CLUSTERID1 than local cluster uuid CLUSTERID2, rejecting
        at org.opensearch.cluster.coordination.JoinHelper.lambda$new$5(JoinHelper.java:213) ~[opensearch-1.3.5.jar:1.3.5]
        at org.opensearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:91) ~[opensearch-1.3.5.jar:1.3.5]
        at org.opensearch.transport.InboundHandler$RequestHandler.doRun(InboundHandler.java:443) ~[opensearch-1.3.5.jar:1.3.5]
        at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:792) ~[opensearch-1.3.5.jar:1.3.5]
        at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:50) ~[opensearch-1.3.5.jar:1.3.5]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_352]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_352]
        at java.lang.Thread.run(Thread.java:750) [?:1.8.0_352]

or

[..TIMESTAMP..][WARN ][o.o.c.c.JoinHelper       ] [vidm1.hostname] last failed join attempt was 335ms ago, failed to join {vidm2.hostname}{Random22characters-3}{Random22characters-4}{xxx.xxx.x.x}{xxx.xxx.x.x:9300}{dimr}{shard_indexing_pressure_enabled=true} with JoinRequest{sourceNode={vidm1.hostname}{Random22characters-1}{Random22characters-5}{xxx.xxx.x.x}{xxx.xxx.x.x:9300}{dimr}{shard_indexing_pressure_enabled=true}, minimumTerm=10, optionalJoin=Optional.empty}
org.opensearch.transport.RemoteTransportException: [vidm2.hostname][xxx.xxx.x.x:9300][internal:cluster/coordination/join]
Caused by: java.lang.IllegalStateException: failure when sending a validation request to node
        at org.opensearch.cluster.coordination.Coordinator$2.onFailure(Coordinator.java:631) ~[opensearch-1.3.5.jar:1.3.5]
        at org.opensearch.action.ActionListenerResponseHandler.handleException(ActionListenerResponseHandler.java:72) ~[opensearch-1.3.5.jar:1.3.5]
        at org.opensearch.transport.TransportService$ContextRestoreResponseHandler.handleException(TransportService.java:1357) ~[opensearch-1.3.5.jar:1.3.5]
        at org.opensearch.transport.InboundHandler.lambda$handleException$3(InboundHandler.java:415) ~[opensearch-1.3.5.jar:1.3.5]
        at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:733) ~[opensearch-1.3.5.jar:1.3.5]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_352]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_352]
        at java.lang.Thread.run(Thread.java:750) [?:1.8.0_352]
Caused by: org.opensearch.transport.RemoteTransportException: [vidm1.hostname][xxx.xxx.x.x:9300][internal:cluster/coordination/join/validate]
Caused by: org.opensearch.cluster.coordination.CoordinationStateRejectedException: join validation on cluster state with a different cluster uuid CLUSTERID1 than local cluster uuid CLUSTERID2, rejecting
        at org.opensearch.cluster.coordination.JoinHelper.lambda$new$5(JoinHelper.java:213) ~[opensearch-1.3.5.jar:1.3.5]
        at org.opensearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:91) ~[opensearch-1.3.5.jar:1.3.5]
        at org.opensearch.transport.InboundHandler$RequestHandler.doRun(InboundHandler.java:443) ~[opensearch-1.3.5.jar:1.3.5]
        at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:792) [opensearch-1.3.5.jar:1.3.5]
        at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:50) [opensearch-1.3.5.jar:1.3.5]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_352]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_352]
        at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_352]

Environment

VMware Identity Manager 3.3.7

Cause

The OpenSearch cluster id has changed, which could be due to different factors.

Resolution

  1. Take a Snapshot of the vIDM environment from vCenter.
  2. Open a SSH session to the node which is not joined to the cluster and reporting 
    ...
    Caused by: org.opensearch.cluster.coordination.CoordinationStateRejectedException: join validation on cluster state with a different cluster uuid CLUSTERID1 than local cluster uuid CLUSTERID2, rejecting

    *Ensure the correct node is selected

  3. To validate the OpenSearch cluster id the following command can be used: curl <hostname>:9200
  4. Stop OpenSearch: /etc/init.d/opensearch stop
  5. Delete all OpenSearch data (when the node is joined to the existing cluster all data will be synchronised) rm -rf /db/opensearch/horizon/nodes/*
  6. Start OpenSearch: /etc/init.d/opensearch start
  7. Wait for some time and check it joins the cluster and check the cluster using: curl <hostname>:9200