Aria Operations for Logs cluster inaccessible
search cancel

Aria Operations for Logs cluster inaccessible

book

Article ID: 414722

calendar_today

Updated On:

Products

VCF Operations/Automation (formerly VMware Aria Suite)

Issue/Introduction

  • Cluster down
  • UI is inaccessible
  • Cassandra is constantly bouncing

You may see the error repeatedly in the /storage/var/loginsight/cassandra.log

554 StorageProxy.java:708 - Failed Paxos prepared locally
Java.lang.IllegalArgumentException: Out of range

Environment

Aria Operations for Logs 8.x

Cause

Potential causes include file corruption or a timestamp/sequence mismatch in the Cassandra Paxos metadata files under the system keyspace (paxos files). This can cause local Paxos state to become invalid and trigger the observed error and node instability.

 

Resolution

Important: Ensure you have appropriate backups and permissions before proceeding.

Prechecks

  1. Ensure you have root access to the node.
  2. Verify cluster health and which node(s) are affected (do not proceed on multiple nodes simultaneously unless directed by Broadcom Support).
  3. Create a snapshot/backup of the node(s) or VM (pre-upgrade snapshot or equivalent) so you can revert if needed.

Remediation steps (run on the problematic node only):

  1. SSH to the problematic node as root
  2. Move the Paxos metadata files to a temporary location:\
    • mv /storage/core/loginsight/cidata/cassandra/data/system/paxos* /tmp/
    • mv /storage/core/loginsight/cidata/cassandra/data/system/_paxos* /tmp/
    • Note: These files belong to the Cassandra system keyspace and contain local Paxos state; removing/moving them forces Cassandra to regenerate local Paxos metadata on startup.
  3. Restart the Log Insight/Cassandra service: systemctl restart loginsight
  4. Monitor logs and service status:
    1. Check Cassandra/loginsight logs for recurring errors: tail -f /storage/var/loginsight/cassandra.log
    2. Verify service is active: systemctl status loginsight
  5. Validate cluster health from another healthy node or using your monitoring tools. Confirm UI accessibility and that the node rejoins the cluster cleanly.