search cancel

JCP loops a reconnect if there are slowdowns on reconnecting

book

Article ID: 225530

calendar_today

Updated On:

Products

CA Automic One Automation

Issue/Introduction

There are a number of symptoms that can be seen with this issue:
The system is unavailable to log into although the WPs show everything running.

The JCP logs will show something like:
20210824/154523.432 - 51 U00003524 UCUDB: ===> Time critical DB call! OPC: 'EXEC' time: '80115ms'
20210824/154523.433 - 51 U00003525 UCUDB: ===> 'UPDATE MQSRV SET MQSRV_LastUpdate = ? WHERE MQSRV_Name = ?'
After this many threads begin to disconnect and reconnect which we can see with the messages:
20210824/155144.740 - 57 U00003545 UCUDB: Opening database ...
The final message in the log before the customer restarted the JCP is:
20210824/165823.492 - 51 U00045014 Exception 'com.automic.kernel.osgi.NoServiceException: "No registered service for 'com.automic.network.api.NetworkConnections', filter:null"' at 'com.automic.kernel.osgi.OSGIRegistryFunctions.lookup():44'.
20210824/165823.493 - 51 U00003620 Routine 'com.automic.kernel.impl.DefaultExceptionHandler' forces trace because of error.
20210824/165823.496 - 51 U00003450 The TRACE file was opened with the switches '0000000000000000'.
20210824/165826.416 - 51 U00003449 Output to the TRACE file is finished.

on multiple threads (the thread number is after the date/timestamp but before the U code).

A JCP restart resolves the issue and allows for logins.

Cause

The root cause is a network slowdown between the JCP and the database or a slowdown on the database that causes the JCP to not be able to reconnect in a timely manner and this gets compounded and results in a loop.

Environment

Release : 12.3

Component : AUTOMATION ENGINE

Resolution

This looping has been resolved in 12.3 with the release of 12.3.8 for the automation engine component (please note that an update of the automation engine component also requires an update of utilities, initialdata, and AWI).

IMPORTANT: The fix simply stops Automic from reacting to the issue with a loop.  The actual root cause needs to be identified by network, database, and other system admins on the site where they occur.

The workaround is to restart the affected JCP and perhaps JWP.