Primary node crashes intermittently - VCF Operations for Logs 9.0.2
search cancel

Primary node crashes intermittently - VCF Operations for Logs 9.0.2

book

Article ID: 439198

calendar_today

Updated On:

Products

VCF Operations

Issue/Introduction

  • The /storage/core/loginsight/var/runtime.log file on the primary node will contain errors like:
    ["DaemonCommands-thread-#"/##.##.##.## ERROR] [org.apache.thrift.server.TThreadPoolServer] [Thrift Error occurred during processing of message.]
    org.apache.thrift.transport.TTransportException: org.bouncycastle.tls.TlsNoCloseNotifyException: No close_notify alert received before connection closed
  • The runtime.log on the primary node (/storage/core/loginsight/var) may contain the following errors, even if VCF Operations for Logs truststore is healthy.
    java.security.KeyStoreException: Failed to load default trust store
            at org.bouncycastle.jsse.provider.ProvTrustManagerFactorySpi.engineInit(ProvTrustManagerFactorySpi.java:182) ~[bctls-fips-2.0.19.jar:2.0.19]
            at javax.net.ssl.TrustManagerFactory.init(Unknown Source) ~[?:?]
            at com.vmware.loginsight.commons.security.BCX509ExtendedTrustManagerImpl.<init>(BCX509ExtendedTrustManagerImpl.java:28) ~[commons-lib.jar:?]
            at com.vmware.loginsight.commons.security.UrlConnectionManager.makeTrustManager(UrlConnectionManager.java:568) ~[commons-lib.jar:?]
            at com.vmware.loginsight.commons.security.UrlConnectionManager.initSocketFactory(UrlConnectionManager.java:577) ~[commons-lib.jar:?]
            at com.vmware.loginsight.commons.security.UrlConnectionManager.getSocketFactory(UrlConnectionManager.java:503) ~[commons-lib.jar:?]
  • The VCF Operations for Logs UI may be unavailable.
  • There may be 100% CPU spikes on the Primary node.
  • Service restarts of the Primary node can be seen on the Cluster status page.
  • A cluster reboot may be required to restore services when the status is "Unknown."

Environment

VCF Operations for Logs 9.0.2

Cause

This behavior occurs due to a large number of TLS connections being opened with the primary node that are not being closed correctly which eventually cause OutOfMemory errors.

Resolution

Broadcom is aware of this issue in VCF Operations for Logs 9.0.2 and is currently developing a permanent fix. Subscribe to this article to receive notifications as updates become available.

If the VCF Operations for Logs 9.0.2 UI is unavailable, reboot all of the appliance VMs in the cluster.