One of the worker nodes in Aria Operations for Logs fails to report its status in Cassandra due to token collision issue
search cancel

One of the worker nodes in Aria Operations for Logs fails to report its status in Cassandra due to token collision issue

book

Article ID: 369764

calendar_today

Updated On:

Products

VCF Operations/Automation (formerly VMware Aria Suite)

Issue/Introduction

  • When you run nodetool-no-pass status command, you will notice one or the other worker node is found missing/hidden.
  • But, when you run command nodetool-no-pass describecluster, you will see all nodes in the cluster.

  • Token collision causes one of the nodes to be 'hidden' by the other one - while Cassandra service runs normally on the 'hidden' node and the node isn't visible to the rest of the nodes. Physically, the node doesn't participate in read/write operations and is, in fact, in a stale state. Additionally, of the two worker nodes holding the same token ranges, the one gets 'hidden' that has started earlier - if we restart the nodes or Cassandra service on the nodes, the node/service that comes up later becomes the active one and the other one gets 'hidden'.
  • Currently, in Aria Operations for Logs we are facing token collision errors when trying to add multiple nodes to the cluster simultaneously, without waiting for each node for "Startup complete" action and getting the following stack trace in  /var/log/loginsight/cassandra.log:
    ERROR [main] 2024-06-12T10:48:17,780 CassandraDaemon.java:898 - Exception encountered during startup
    java.lang.RuntimeException: Bootstrap Token collision between /##.##.##.42:7000 and /##.##.##.43:7000 (token ###############
            at org.apache.cassandra.locator.TokenMetadata.addBootstrapTokens(TokenMetadata.java:378) ~[apache-cassandra-4.1.0.jar:4.1.0]
            at org.apache.cassandra.locator.TokenMetadata.addBootstrapTokens(TokenMetadata.java:360) ~[apache-cassandra-4.1.0.jar:4.1.0]
            at org.apache.cassandra.service.StorageService.handleStateBootstrap(StorageService.java:2798) ~[apache-cassandra-4.1.0.jar:4.1.0]
            at org.apache.cassandra.service.StorageService.onChange(StorageService.java:2496) ~[apache-cassandra-4.1.0.jar:4.1.0]
            at org.apache.cassandra.gms.Gossiper.doOnChangeNotifications(Gossiper.java:1659) ~[apache-cassandra-4.1.0.jar:4.1.0]
            at org.apache.cassandra.gms.Gossiper.addLocalApplicationStateInternal(Gossiper.java:2057) ~[apache-cassandra-4.1.0.jar:4.1.0]
  • In the same cassandra.log file, you will also see log entries similar to:

    INFO  [main] 2024-06-12T06:04:50,598 StorageService.java:3015 - Nodes /##.##.##.##:7000 and /##.##.##.##:7000 have the same token -###################. /##.##.##.##:7000 is the new owner
    INFO  [main] 2024-06-12T06:04:50,599 StorageService.java:3015 - Nodes /##.##.##.##:7000 and /##.##.##.##:7000 have the same token -###################. /##.##.##.##:7000 is the new owner

Environment

vRealize Log Insight 8.x

VMware Aria Operations for Logs 8.x

Cause

Token collision errors in Cassandra occurs when two or more nodes in a Cassandra cluster are assigned the same token range.

Resolution

Please raise a ticket with Broadcom support referencing this article.