Node disconnecting and reconnecting
search cancel

Node disconnecting and reconnecting

book

Article ID: 432649

calendar_today

Updated On:

Products

VCF Operations/Automation (formerly VMware Aria Suite)

Issue/Introduction

  • Node status under Cluster Management shows a node disconnecting and reconnecting.

  • Error seen in /storage/core/loginsight/var/cassandra.log:
    ERROR [main] YYYY-MM-DDTHH:MM:SS,232 LogTransaction.java:559 - Unexpected disk state: failed to read transaction log [nb_txn_compaction_86da3db0-0c07-11f1-854b-719d036c64d5.log in /storage/core/loginsight/cidata/cassandra/data/system/local-7ad54392bcdd35a684174e047860b377]
    Files and contents follow:
    /storage/core/loginsight/cidata/cassandra/data/system/local-7ad54392bcdd35a684174e047860b377/nb_txn_compaction_86da3db0-0c07-11f1-854b-719d036c64d5.log
            ADD:[/storage/core/loginsight/cidata/cassandra/data/system/local-7ad54392bcdd35a684174e047860b377/nb-5383-big-,0,8][2607439752]
            REMOVE:[/storage/core/loginsight/cidata/cassandra/data/system/local-7ad54392bcdd35a684174e047860b377/nb-5380-big-,1771336173134,8][895341009]
                    ***Incomplete fileset detected for sstable [nb-5380-big-]: number of files [0] should have been [8].
            REMOVE:[/storage/core/loginsight/cidata/cassandra/data/system/local-7ad54392bcdd35a684174e047860b377/nb-5379-big-,1742398305763,8][851148336]
            REMOVE:[/storage/core/loginsight/cidata/cassandra/data/system/local-7ad54392bcdd35a684174e047860b377/nb-5382-big-,1771336179167,8][4005966268]
            REMOVE:[/storage/core/loginsight/cidata/cassandra/data/system/local-7ad54392bcdd35a684174e047860b377/nb-5381-big-,1771336173230,8][2387490077]
            ^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
                    ***Failed to parse [^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@]
    
    ERROR [main] YYYY-MM-DDTHH:MM:SS,235 CassandraDaemon.java:900 - Cannot remove temporary or obsoleted files for system.local due to a problem with transaction log files. Please check records with problems in the log messages above and fix them. Refer to the 3.0 upgrading instructions in NEWS.txt for a description of transaction log files.

Environment

Aria Operations for Logs 8.18.x

Cause

Cassandra transaction logs contain an incomplete fileset or corrupted local system disk state.

Resolution

Pre-requisite:

  1. Shut down all nodes leaving the Primary for last

  2. Create a snapshot or backup copy of the VMware Aria Operations for Logs virtual appliances


File System Check Procedure:

  1. In the vSphere Client, open the console of the desired node.

  2. With the console open, restart or power on the virtual machine.

  3. When the GRUB loader menu appears with the Photon splash screen, immediately press the letter 'e' to launch the GNU GRUB edit menu.

  4. Navigate to the end of the line that starts with linux.

    • Note: If VMware Aria Operations for Logs was upgraded from a previous release and displays the SUSE Linux splash screen, use the up and down arrow keys to select the Photon OS Latest option. The cursor appears at the end of a line of boot options near the bottom of the display. Even if the Photon OS Latest option appears to already be selected, use the arrow keys. Otherwise, the machine continues to boot, and the appliance must be rebooted to start the process over.

    • Note: If you cannot reach the boot menu before it disappears, enable Force BIOS setup in the Virtual Machine's Settings > VM Options > Boot Options and reboot.

  5. At the end of the line, add a space, then type fsck.mode=force fsck.repair=yes

  6. Press F10 or CTRL+X to boot the appliance.

 

Workaround:

If the File System Check fails to correct the issue, use the documentation below to remove the worker node and redeploy a new one.

  1. Remove a Worker Node from a VMware Aria Operations for Logs Cluster
  2. Deploy the VMware Aria Operations for Logs Virtual Appliance
  3. Join an Existing Deployment