AWI: Loss of high availability
search cancel

AWI: Loss of high availability

book

Article ID: 256784

calendar_today

Updated On:

Products

CA Automic Workload Automation - Automation Engine

Issue/Introduction

We have a problem with High Availability of AWI


Our environment has four servers: two AE engines (A and B), Analytics and an AWI instances


The problem is that we can only access AE with the AWI when one specific engine (A) is running. When that AE (A) is running and the other (B) is off, access with AWI is not possible.

We have checked the Tomcat uc4config.xml file and it seems correct.

 

This happens since we updated to version 21.0.3, previously in version 12.6 it worked correctly.

 

Testing was done with the first AE (A) shutdown. 

When tracing the CP logs of AE B, no connection attempt to the CP is registered.

 

Subsequent testing of the AWI with tracing (xml=3 in uc4config.xml and <root level="DEBUG"> in logback.xml) shows that there is no valid certificate present for server B:

2022-12-21 11:36:53,735 pool-1-thread-1        [DEBUG] NOLOGIN/- F498433883525FF6B8D6877EBF663E6E-0  +1 [com.uc4.ecc.backends.impl.dataservice.connection.ConnectionService] - Connection to Automation Engine failed at 'serverB.mydomain.com:8443'.
java.util.concurrent.ExecutionException: java.io.IOException: Failed to connect to serverB.mydomain.com:8443
<...>
Caused by: java.io.IOException: Failed to connect to serverB.mydomain.com:8443
<...>
Caused by: java.util.concurrent.ExecutionException: javax.net.ssl.SSLHandshakeException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
<...>
Caused by: javax.net.ssl.SSLHandshakeException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target

 

Just above these messages, we see the loading of the certificates in the log.

Conclusion: The certificates of server A are found but no specific certificate for server B is present:

2022-12-21 11:36:53,291 pool-1-thread-1        [DEBUG] NOLOGIN/- F498433883525FF6B8D6877EBF663E6E-0  +1 [com.uc4.ecc.backends.impl.dataservice.connection.ConnectionService] - Attempting to connect to Automation Engine at 'serverB.mydomain.com:8443'...
2022-12-21 11:36:53,313 Thread-11              [TRACE] NOLOGIN/- NOUI   [com.uc4.ecc.framework.core.aetracing.AutomationEngineTraceListener] - Starting: Opening connection to serverB.mydomain.com/10.20.30.40:8443
2022-12-21 11:36:53,324 Thread-11              [TRACE] NOLOGIN/- NOUI   [com.uc4.ecc.framework.core.aetracing.AutomationEngineTraceListener] - Loading certificates from directory: '/opt/automic/certs'.
<...>
2022-12-21 11:36:53,344 Thread-11              [TRACE] NOLOGIN/- NOUI   [com.uc4.ecc.framework.core.aetracing.AutomationEngineTraceListener] - Certificate loaded from file '/opt/automic/certs/serverA.cer'.
2022-12-21 11:36:53,345 Thread-11              [TRACE] NOLOGIN/- NOUI   [com.uc4.ecc.framework.core.aetracing.AutomationEngineTraceListener] - Certificate loaded from file '/opt/automic/certs/serverA.mydomain.com.cer'.
2022-12-21 11:36:53,391 Thread-11              [TRACE] NOLOGIN/- NOUI   [com.uc4.ecc.framework.core.aetracing.AutomationEngineTraceListener] - adding 131 certificates from the default trust manager
2022-12-21 11:36:53,511 Thread-11              [TRACE] NOLOGIN/- NOUI   [com.uc4.ecc.framework.core.aetracing.AutomationEngineTraceListener] - WebSocketClient started.
2022-12-21 11:36:53,722 Thread-11              [TRACE] NOLOGIN/- NOUI   [com.uc4.ecc.framework.core.aetracing.AutomationEngineTraceListener] - WebSocketClient has no more active sessions and will shut down.
2022-12-21 11:36:53,732 Thread-11              [TRACE] NOLOGIN/- NOUI   [com.uc4.ecc.framework.core.aetracing.AutomationEngineTraceListener] - WebSocketClient stopped.
2022-12-21 11:36:53,732 pool-1-thread-1        [DEBUG] NOLOGIN/- F498433883525FF6B8D6877EBF663E6E-0  +1 [com.uc4.ecc.backends.impl.dataservice.connection.ConnectionService] - Attempt to close connection made, but parameter is null.
2022-12-21 11:36:53,735 pool-1-thread-1        [DEBUG] NOLOGIN/- F498433883525FF6B8D6877EBF663E6E-0  +1 [com.uc4.ecc.backends.impl.dataservice.connection.ConnectionService] - Connection to Automation Engine failed at 'serverB.mydomain.com:8443'.

Environment

Release : 21.0.x

Cause

When using (self-signed) certificates that are specific to each server these need to be present on all servers that need to connect.

Here the client certificate of one of the servers was not present.

Resolution

Create the client certificate for the specific server on the client side.