Aria Operations for Logs UI is inaccessible as cassandra is down
search cancel

Aria Operations for Logs UI is inaccessible as cassandra is down

book

Article ID: 374952

calendar_today

Updated On:

Products

VMware Aria Suite

Issue/Introduction

Error messages similar to the following appear in /storage/core/loginsight/var/runtime.log file when starting Log Insight Daemon service:

apache-cassandra-3.11.2/bin/cassandra, -f, -R]] java.io.FileNotFoundException: /storage/core/loginsight/cidata/cassandra/data/logdb/alerts-03eedbd481f632a4ab0c04e3c44c041b/.enable_index/mc-37-big-CompressionInfo.db (Too many open files)] 

You may also see node showing as disconnected or flapping in the UI and when checking Cassandra logs you see a large amount entries like Opening /storage/core/loginsight/cidata/cassandra/data/logdb/******************/nb-******-big 

Environment

Aria Operations for Logs 8.12 and later 

Cause

Cassandra cannot start normally and it need extra ram to start

Resolution

1. Take a snapshot of all nodes in the cluster without memory and without quiesce

2. Open ssh session as root to all nodes in the cluster

3. Stop Loginsight daemon services by running the command 

systemctl stop loginsight

4. Make sure watchdog is not running

ps aux | grep loginsight-watchdog

root 3677 0.0 0.0 4420 732 pts/0 S+ 05:33 0:00 grep --color=auto loginsight-watchdog

If watchdog is still running you can run the command:

killall -9 loginsight-watchdog

Or you an kill the process by process id with the command

kill -9 <processid>

5. Increase the open file limit 

ulimit -n 100000

6. Force start Cassandra

/usr/lib/loginsight/application/sbin/li-cassandra.sh --startnow --force

7. To check the status of Cassandra run the command

/usr/lib/loginsight/application/lib/apache-cassandra-x/bin/nodetool-no-pass status

8. If all nodes show Cassandra is in up (UN status) you can run repairing on all nodes.  If you see any node in DN state the repair will not complete successfully.  Make sure a node is up and running before proceeding to the next one

/usr/lib/loginsight/application/lib/apache-cassandra-x/bin/nodetool-no-pass flush

/usr/lib/loginsight/application/lib/apache-cassandra-x/bin/nodetool-no-pass repair

9. Once repair is over, stop Cassandra and start vRLI Daemon on each node. 

/usr/lib/loginsight/application/sbin/li-cassandra.sh --stopnow --force

systemctl start loginsight

10. Repeat the process for each node in the cluster