Intermittent Active Directory logon failures in Aria Operations for Logs due to cluster node inconsistency
search cancel

Intermittent Active Directory logon failures in Aria Operations for Logs due to cluster node inconsistency

book

Article ID: 419799

calendar_today

Updated On:

Products

VCF Operations/Automation (formerly VMware Aria Suite)

Issue/Introduction

Users experience intermittent failures when attempting to log in using Active Directory (AD) credentials. Local admin accounts function correctly.

The following symptoms are observed:

  • One node in the cluster has a significantly different uptime compared to other nodes.

  • One node may appear to be rebooting consistently or recently recovered from a crash.

  • Errors such as unknown_ca or authentication timeouts appear in the runtime.log on specific nodes:

[2025-06-12 09:15:57.626+0000] ["netty-event-loop-67"/##.###.###.# ERROR] [play.core.server.netty.PlayRequestHandler] [Exception caught in Netty] io.netty.handler.codec.DecoderException: javax.net.ssl.SSLHandshakeException: Received fatal alert: unknown_ca
  • The cassandra.log may show:

2025-06-12T09:28:40,252 AbstractChannelHandlerContext.java:311 - An exception 'java.lang.NullPointerException' [enable DEBUG level for full stacktrace] was thrown by a user handler's exceptionCaught() method while handling the following io.netty.handler.codec.DecoderException: javax.net.ssl.SSLHandshakeException: java.lang.Unknown_ca

Environment

Aria Operations for Logs 8.x

Cause

The cluster is in an inconsistent state. This occurs when one node reboots or falls out of sync while other nodes remain active for an extended period. This uptime mismatch can cause authentication requests to fail intermittently depending on which node processes the request.

Resolution

Perform a controlled rolling reboot of the cluster nodes to restore consistency.

  1. Verify Cluster Health

    1. Log in to the Primary node UI as a local admin user.

    2. Navigate to Administration > Cluster.

    3. Note the IP addresses/FQDNs of all nodes (Primary and Workers).

  2. Power off Worker Nodes

    • From vSphere, go to the Virtual Machine for each Worker node and click Shut Down Guest OS until all Workers are off.

  3. Power off Primary Node

    • From vSphere, go to the Primary Virtual Machine and click Shut Down Guest OS.

  4. Power on Primary Node

    • From vSphere, go to the Primary Virtual Machine and click Power On. Wait for it to initialize.

  5. Power on Worker Nodes

    • From vSphere, power on the Worker nodes one by one.

  6. Verify Resolution

    • Once the cluster is fully online, attempt an Active Directory login.

    • Confirm that runtime.log no longer shows authentication errors.

Additional Information