Problems one node of Aria for logs cluster.
search cancel

Problems one node of Aria for logs cluster.

book

Article ID: 399087

calendar_today

Updated On:

Products

VMware Aria Suite

Issue/Introduction

  • Cannot access the node via IP or FQDN.
  • In 'System Monitor', when the node in question is selected, the error 'Failed to load resources' is presented.
  • In 'Explore Logs', the error 'Failed to load resources, results may be inaccurate because not all nodes successfully returned results.', is present.
  • The command 'nodetool-no-pass status' shows the node in question with the status 'DN'.

Environment

Aria Operations 8.18.x

Cause

When a node becomes disconnected from an Operations for Logs cluster, it can cause various problems. The node configuration becomes desynchronized from the connected nodes. 

Resolution

Before taking any of the steps below, please ensure snapshots of all cluster nodes are taken.

The steps below are not mutually inclusive. Take each step in order and test if the node has reconnected after each step by running the 'nodetool' command below. If the status of the node is 'DN', move to the next step.

  • nodetool-no-pass status
  1. Confirm that the 'loginsight-config.xml#xxx' matches on the disconnected node. The KB below outlines this process;
  2. Confirm that the internal certificates are in date by running the following command;
    • echo "" | keytool -list -keystore /usr/lib/loginsight/application/etc/3rd_config/keystore -rfc 2> /dev/null | openssl x509 -noout -enddate
    • If an error is returned, or the date returned is in the past, then run the following KB;
    • Replace internal certificates

  3. Check the Cassandra logs for hints files corruption and remove, as per the KB below;
  4. Repair the Cassandra cluster, following the steps in the KB below;

If all of the above steps have failed, and the node has not reconnected, please raise a case with Broadcom support.