Inaccessible User Interface and Node Removal during SSL Certificate Expiration in VMware Aria Operations for Logs

Products

VCF Operations/Automation (formerly VMware Aria Suite)

Issue/Introduction

Accessing the VMware Aria Operations for Logs user interface (UI) using local admin, vIDM, or Active Directory (AD) credentials fails.

To verify the certificate expiration, run the following command on the affected node:

echo "" | keytool -list -keystore /usr/lib/loginsight/application/etc/3rd_config/keystore -rfc 2> /dev/null | openssl x509 -noout -enddate

Login attempts result in error messages such as Invalid credentials or account locked.
Log in to the primary node as the root user via SSH and check the admin user status by running the following command:
```
/usr/lib/loginsight/application/sbin/li-reset-admin-passwd.sh --checkAdminStatus
```
The following error occurs:
```
FAILED: Unable to get user data. Possible Cassandra is down
```
The following error is found in the /storage/core/loginsight/var/cassandra.log file: Received fatal alert: certificate_expired

Environment

VMware Aria Operations for logs 8.18.x

Cause

This issue occurs when internal or external SSL certificates expire, preventing the Cassandra database from authenticating connections between nodes. In this specific scenario, a new node is added to the cluster while certificates are expired, causing a failure in cluster synchronization.

Resolution

Part 1: Restore Cluster Health and Remove Faulty Nodes

Create virtual machine snapshots for all nodes in the cluster before proceeding. How to take a Snapshot of Operations for Logs
Run the following command on all nodes to force the database service to start:
```
/usr/lib/loginsight/application/sbin/li-cassandra.sh --startnow --force
```
Identify and Remove Faulty Node:
1. Check node status:
```
nodetool-no-pass status
```
2. Remove the new node using its Host ID from the status output:
```
nodetool-no-pass removenode [Host_ID-showed-in-nodetool-no-pass-status]
```
On all remaining nodes, restart the Log Insight service:
```
/etc/init.d/loginsight restart
```
Fix certificate:
1. Run the command:
```
echo | openssl s_client -connect localhost:443 2>/dev/null | openssl x509 -noout -purpose | grep 'SSL client :'
```
2. If it returns No, generate CA custom certs by following the steps from KB Install a Custom SSL Certificate.
3. If it returns Yes, apply the steps from KB Install a self-signed certificate in VMware Aria Operations for Logs 8.12 and Later.
When you access the Aria Ops for Logs user interface, check if the new node removed in step 3 is gone; otherwise, delete it from the user interface by clicking the X button.

Part 2: Migrate Log Repository to a New Node

To replace a node while preserving data, follow these steps to migrate logs.

Step 1: Deploy New Node

Create virtual machine snapshots for all nodes in the cluster before proceeding. How to take a Snapshot of Operations for Logs
Deploy a new VMware Aria Operations for Logs worker node. Add a Worker Node to a VMware Aria Operations for Logs Cluster
Ensure the storage capacity matches the existing cluster nodes.

Step 2: Move the log repository to the new node. Two options are available:

Option 1 :If deploying a new node with a new IP address, follow the steps from KB Replace a node in an Aria Operations for Logs cluster.

Or
Option 2: If the IP address needs to be reused from the old node, copy the log repository into external storage devices like an NFS server.
Copy the log repository to the New Node: Large repositories can take several hours to transfer. To prevent the transfer from failing if your SSH session disconnects, run the scp command in the background.
1. On the NFS server, initiate the copy:
```
nohup scp -r /storage/core/loginsight/cidata/store/* root@[New-node-IP_ADDRESS]:/storage/core/loginsight/cidata/store/ > nohup.out 2>&1
```
2. Enter the root password when prompted.
3. Press Ctrl + Z to temporarily suspend the command. You see:
```
[1]+ Stopped nohup scp -r ...
```
4. Type bg and press Enter to move the process to the background:
```
[1]+ nohup scp -r ... &
```
5. To verify the process is still running, type:
```
jobs
```
  The output confirms the status as Running.
Run the Importer Script: After the data is copied, you must index the buckets. Because this process can take hours, use a background script to ensure completion.
1. Create a script named importer.sh and add the following lines:
```
#!/usr/bin/env bash
for bucket in $(ls /storage/core/loginsight/cidata/store | grep -v 'generation|buckets|strata_write.lock'); do
   echo y | /usr/lib/loginsight/application/sbin/bucket-index add $bucket --statuses archived;
done
```
2. Make the script executable:
```
chmod +x importer.sh
```
3. Run the script in background mode:
```
nohup ./importer.sh &
```
4. Monitor the status of the script:
```
ps aux | grep importer
```
5. Once the ps command no longer shows the importer.sh process, the script is complete. Proceed to start the Log Insight service.