Error "Failed to dispatch hints file" "file is corrupted" in cassandra.log
search cancel

Error "Failed to dispatch hints file" "file is corrupted" in cassandra.log

book

Article ID: 315961

calendar_today

Updated On:

Products

VCF Operations/Automation (formerly VMware Aria Suite) VCF Operations

Issue/Introduction

  • Cluster nodes are not responding.
  • The UI may be inaccessible
  • A banner is seen in the Management > Cluster page that says "The latest cluster data is not currently available, most likely because the primary node is not accessible"
  • The daily automatic health check that runs on the SDDC Manager in a VCF environment will fail for all Aria for Logs cluster nodes
  • The cassandra service crashes with the following log messages in /var/log/loginsight/cassandra.log:
    ERROR [HintsDispatcher:1] ####-##-## 07:00:24,378 HintsDispatchExecutor.java:243 - Failed to dispatch hints file ########-####-####-####-############-#############-#.hints: file is corrupted ({})"
    OR
    ERROR <DATE> [HintsDispatcher:1]  JVMStabilityInspector.java:68 - Exception in thread Thread[HintsDispatcher:1,1,HintsDispatcher]org.apache.cassandra.io.FSReadError: java.io.IOException: Corrupt hint file found

Errors seen in runtime.log

  • [com.datastax.oss.driver.internal.core.control.ControlConnection] [[s68833] Error connecting to Node(endPoint=xx.xx.xx.xx:9042, hostId=null, hashCode=xxxxxxxx), trying next node (AnnotatedConnectException: Connection refused: /xx.xx.xx.xx:9042)]




Environment

VCF Operations for Logs 9.0.x
VMware Aria Operations for Logs 8.x
VMware vRealize Log Insight 8.x

Cause

Corrupt hints for the internal Cassandra database are in the /usr/lib/loginsight/application/lib/apache-cassandra-*/data/hints directory.

Resolution

Note: Take a snapshot of all nodes in the cluster before proceeding. How to take a Snapshot of VMware Aria Operations for Logs

  1. Stop the Log Insight service on all nodes
    service loginsight stop
  2. Remove the hints files on all nodes
    rm -rf /usr/lib/loginsight/application/lib/apache-cassandra-*/data/hints/*
  3. Force start the Cassandra service on all nodes
    /usr/lib/loginsight/application/sbin/li-cassandra.sh --startnow --force
  4. Verify Cassandra service status is UN (Up/Normal) for all nodes on all nodes.
    nodetool-no-pass status

    Note: The output will look similar to the below

  5. Force stop the Cassandra service on all nodes
    /usr/lib/loginsight/application/sbin/li-cassandra.sh --stopnow --force
  6. Start the Log Insight service on all nodes
    service loginsight start

Allow a few minutes for the UI to become accessible