Log Insight Daemon startup failed: Failed to start Cassandra Server: Cassandra failed to start
search cancel

Log Insight Daemon startup failed: Failed to start Cassandra Server: Cassandra failed to start

book

Article ID: 337464

calendar_today

Updated On:

Products

VMware Aria Suite

Issue/Introduction

Symptoms:

  • Unable to start the loginsight service.
  • The node shows Disconnected status in the Management > Cluster page
  • In the /var/log/loginsight/runtime.log file, you see entries similar to:
    [2017-06-19 16:30:48.489+0000] ["main"/IP_Address INFO] [com.vmware.loginsight.daemon.LogInsightDaemon] [Failed to start Cassandra Server: Cassandra failed to start.]
    [2017-06-19 16:30:48.489+0000] ["main"/IP_Address INFO] [com.vmware.loginsight.daemon.LogInsightDaemon] [Exception during start cassandra database]
    [2017-06-19 16:30:48.489+0000] ["main"/IP_Address INFO] [com.vmware.loginsight.daemon.LogInsightDaemon] [Ended start cassandra database at 18396 ms after launch, took 11575 ms]
    [2017-06-19 16:30:48.490+0000] ["main"/IP_Address FATAL] [com.vmware.loginsight.daemon.LogInsightDaemon] [Error starting services]
    com.vmware.loginsight.daemon.LogInsightDaemon$StartupFailedException: Daemon startup failed: Failed to start Cassandra Server: Cassandra failed to start..
    ...
    Caused by: java.lang.Exception: Cassandra failed to start.
    
  • In the /storage/var/loginsight/cassandra.log file, you see entries similar to:
    ERROR [main] 2017-06-19 16:30:47,802 CassandraDaemon.java:638 - Unable to verify sstable files on disk
    java.nio.file.FileSystemException: /storage/core/loginsight/cidata/cassandra/data/logdb/vimevent_context/la-80366-big-CompressionInfo.db: Input/output error
    
    Note: Copy the path of the corrupted file for use in the resolution as it will vary.
  • Running this command to list the files reported in /storage/var/loginsight/cassandra.log file, you see output similar to:
    ls -lthr /storage/core/loginsight/cidata/cassandra/data/logdb/vimevent_context
    
    ls: cannot access la-80366-big-CompressionInfo.db: Input/output error
    -????????? ? ? ? ? ? la-80366-big-CompressionInfo.db

Note: The preceding log excerpts are only examples. Date, time, and environmental variables may vary depending on your environment.

Environment

VMware Aria Operations for Logs 8.x (previously vRealize Log Insight)

Cause

  • File system corruption within the appliance VM.
  • Frequently caused by running out of free disk space or unexpected power off of the appliance VM.

Resolution

To resolve this issue, remove and recreate the directory containing the corrupted files

  1. Stop the loginsight services:
    systemctl stop loginsight
  2. Determine which directory contains the corrupt files from the ERROR message in the /storage/var/loginsight/cassandra.log file. For example:
    ERROR [main] 2017-06-19 16:30:47,802 CassandraDaemon.java:638 - Unable to verify sstable files on disk
    java.nio.file.FileSystemException: /storage/core/loginsight/cidata/cassandra/data/logdb/vimevent_context/la-80366-big-CompressionInfo.db: Input/output error
    In this case, the corruption is in the /storage/core/loginsight/cidata/cassandra/data/logdb/vimevent_context location.
  3. Move the corrupted directory from the current location to a new location.
    md /storage/core/loginsight/corrupted_files
    mv [location] /storage/core/loginsight/corrupted_files
    Note: Replace [location] with the location identified in Step 2. Example:
    mv /storage/core/loginsight/cidata/cassandra/data/logdb/vimevent_context /storage/core/loginsight/corrupted_files
    • If you are not confident with using the mv command, use cp command to take a copy and then remove the corrupted directory with the rm command.
  4. Recreate the directory.
    mkdir [location]
    Note: replace [location] with the location identified in Step 2. Example:
    mkdir /storage/core/loginsight/cidata/cassandra/data/logdb/vimevent_context
  5. Start the log insight service:
    systemctl start loginsight
    Note: This starts the cassandra service first and then the logInsight daemon.

To fix the File system corruption, perform these steps:

  1. Stop the loginsight services:
    systemctl stop loginsight
  2. Unmount the /storage/core partition:
    umount /storage/core
  3. Check and repair the filesystem:
    /sbin/fsck /dev/mapper/data-core
    Note: Scanning and repairing the file system may take hours for a large disk, plan the downtime accordingly.
  4. After the file repair has completed, remount the /storage/core partition
    mount /storage/core
  5. Start the loginsight service.
    systemctl start loginsight