VCF Operations for Logs UI is not available after upgrade to 9.0.2
search cancel

VCF Operations for Logs UI is not available after upgrade to 9.0.2

book

Article ID: 428924

calendar_today

Updated On:

Products

VCF Operations

Issue/Introduction

The User Interface (UI) is inaccessible after you upgrade to VCF Operations for Logs 9.0.2. You find the Cassandra service fails to start on one or more nodes, resulting in a degraded cluster state.

  • "Page Not Found" or "Service Unavailable" errors occur when you access the UI.
  • systemctl status reports a degraded state:
    State: degraded
  • nodetool-no-pass status shows Cassandra on one or more nodes is down (DN):
    Status=Up/Down|/ State=Normal/Leaving/Joining/Moving
    -- Address Load Tokens Owns (effective) Host ID Rack UN ##.##.##.## 18.32 MiB 256 100.0% [UUID] rack1 DN ##.##.##.## ? 256 100.0% [UUID] rack1 DN ##.##.##.## ? 256 100.0% [UUID] rack1
    or
    Cassandra is not running
  • Inventory Sync through Fleet Manager for VCF Operations for logs, the operation fails with one the following errors :

    Error Code: LCMVRLICONFIG40100   or Error Code: LCMVRLISYSTEM45034
    Operations-logs host is unreachable. Either the host name is incorrect or the virtual machine is not reachable.
    Unable to connect to host. Check host details and retry.
  •  You will see similar exception below in  /storage/var/loginsight/cassandra.log

     ERROR [Messaging-EventLoop-#-#] ####-##-##T##:##,OutboundConnectionInitiator.java:### - Failed to handshake with peer /<VCFOperationsForLogs_WorkerIp>:7000(/<VCFOperationsForLogs_WorkerIp>:7000)
    at io.netty.handler.codec.DecoderException: javax.net.ssl.SSLHandshakeException: Received fatal alert: certificate_unknown

    or 

    ERROR [Messaging-EventLoop-3-3] ####-##-##T##:##:##, InboundConnectionInitiator.java:### - Failed to properly handshake with peer /##.###.##.##:39412. Closing the channel.
    io.netty.handler.codec.DecoderException: javax.net.ssl.SSLHandshakeException: PKIX path validation failed: java.security.cert.CertPathValidatorException: Path does not chain with any of the trust anchors
  • When running systemctl status loginsight you may see the following in the output:
    JENTROPY-ERROR: OSSL_provider_init(): 610

Environment

VCF Operations for Logs 9.0.2

Cause

This issue occurs due to a keystore and trust store mismatch between the Primary and worker nodes, preventing secure communication between the Cassandra instances.

 

Resolution

To resolve this issue, you must synchronize the certificates across the cluster nodes:

  1. Log in to the primary node via SSH as root.
  2. Determine if FIPS is enabled by running:
    /usr/lib/loginsight/application/sbin/fips.sh --all --status
  3. Follow the steps below based on the FIPS status of the cluster

For FIPS Enabled Clusters

  1. Run the following command on both the primary node to get the keystore password:
    pw=$(grep 'syslog-ssl-keystore-password' $(ls -1 /storage/core/loginsight/config/loginsight-config* | tail -n 1) | cut -d\" -f2)
  2. Compare the keystore and truststore results between nodes to verify the mismatch
    keytool -list -storetype bcfks -providerpath /usr/lib/loginsight/application/lib/lib/bc-fips-*.jar -provider org.bouncycastle.jcajce.provider.BouncyCastleFipsProvider -storepass $pw -keystore /usr/lib/loginsight/application/etc/3rd_config/keystore.bcfks
    keytool -list -storetype bcfks -providerpath /usr/lib/loginsight/application/lib/lib/bc-fips-*.jar -provider org.bouncycastle.jcajce.provider.BouncyCastleFipsProvider -storepass $pw -keystore /usr/lib/loginsight/application/etc/truststore.bcfks
  3. Copy the following certificate files from the primary node to each worker node, replacing the existing files
    /usr/lib/loginsight/application/etc/3rd_config/keystore.bcfks

    /usr/lib/loginsight/application/etc/truststore.bcfks

    /storage/core/loginsight/cidata/cassandra/config/cacert.pem
  4. Restart the Log Insight service on all nodes:
    systemctl restart loginsight
  5. Run nodetool-no-pass status and verify all nodes show UN for the status in the first column.
  6. Verify the UI is accessible and check the cluster status at Management > Cluster

For Non-FIPS Enabled Clusters

  1. Run the following command on both the primary node to get the keystore password:
    pw=$(grep 'syslog-ssl-keystore-password' $(ls -1 /storage/core/loginsight/config/loginsight-config* | tail -n 1) | cut -d\" -f2)
  2. Compare the keystore and truststore results between nodes to verify the mismatch
    keytool -list -storepass $pw -keystore /usr/lib/loginsight/application/etc/3rd_config/keystore
    keytool -list -storepass $pw -keystore /usr/lib/loginsight/application/etc/truststore
  3. Copy the following certificate files from the primary node to each worker node, replacing the existing files
    /usr/lib/loginsight/application/etc/3rd_config/keystore

    /usr/lib/loginsight/application/etc/truststore

    /storage/core/loginsight/cidata/cassandra/config/cacert.pem
  4. Restart the Log Insight service on all nodes:
    systemctl restart loginsight
  5. Run nodetool-no-pass status and verify all nodes show UN for the status in the first column.
  6. Verify the UI is accessible and check the cluster status at Management > Cluster

Additional Information

Replace a corrupted truststore in VCF/Aria Operations for Logs