data_engine stops processing QOS and queue goes yellow approx. every 10 minutes

Products

DX Unified Infrastructure Management (Nimsoft / UIM) CA Unified Infrastructure Management On-Premise (Nimsoft / UIM) CA Unified Infrastructure Management SaaS (Nimsoft / UIM)

Issue/Introduction

We migrated last night from Oracle 12c to Oracle 19c database for our UIM environment. We noticed today that the data_engine queue turns yellow approximately every 10 minutes. It queues QOS data for a while then re-connects and then the queue clears again.

Environment

DX UIM v20.4 CU2
UIM Database: Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
data_engine v20.40, schema version: 8.47(0)
Database migrated last night from Oracle 12c to 19c

Cause

Oracle INSTANT client was still running v18 and had not been updated on the Primary hub

data_engine.log errors noticed during the time frame in which the data_engine stopped processing messages and the queue turned yellow are displayed below:

May  5 12:33:39:291 [139921914328832] 0 de: data_engine [QoS] [QoSData] - InsertQosObjectOracle [QoSData] data_engine [QoS] status: -1 
    OCI_ERROR - ORA-00001: unique constraint (NIMSOFTSLM.UQ_S_QOS_DATA) violated 
    OCIEnv: 0x0x7f42a400c4a0 OCIAuthInfo: 0x0x7f42a401e2e0 OCISvcCtx: 0x0x7f42a402d1b8  
 - Error: InsertQosObjectOracle [QoSData] data_engine [QoS] status: -1 
May  5 12:33:39:291 [139921914328832] 3 de: allocateOCIErrorHandle OCIHandleAlloc errhp... 
May  5 12:33:39:291 [139921914328832] 4 de: OCIPrep - (StmtPrepare) preparing oracle statement for SQL(SELECT S_QOS_DATA_table_id_ASQ.CURRVAL FROM DUAL) 
May  5 12:33:39:291 [139921914328832] 4 de: OCIPrep - (OCIAttrGet) getting oracle statement type. 
May  5 12:33:39:292 [139921914328832] 3 de: GetCurrentTableID - current tableid: 6646301

Resolution

After the Oracle INSTANT client was updated on the Primary hub via yum (RHEL 8, Linux) from version 18 to Oracle INSTANT client version 19.9 (latest GA), the data_engine no longer stopped processing QOS data, nor did the queue turn yellow, and the log no longer showed any instances of the error->

OCI_ERROR - ORA-00001: unique constraint (NIMSOFTSLM.UQ_S_QOS_DATA) violated.

Additional Information

When the data_engine was deactivated it would not release its port.

Changes made in the DX UIM PROD environment
-------------------------------------------------------------------------------

data_engine
---------------------
Via Raw Configure...

Increased data_engine loglevel from 3 to 5 for troubleshooting purposes

This can be set back down to 3
Decrease the logsize which was set to 500000, e.g., set it to 50000
Changed hub_bulk_size which was set to 10000, so we set set it down to 1750 which is normally the optimal value. Anything above 2000 is not tested/questionable as to possible ill side effects.

Hub
-------
Via Raw Configure, we deleted the bulk_size setting for the data_engine ATTACH queue since it was set to 25000 but it should be empty (default).

Increased postroute_reply_timeout from the default of 180 to 300.
postroute_reply_timeout determines how long the hub will wait for a reply from any queue/subscriber after sending messages

For more details, please refer to:

hub configuration - timeout, retry and other settings (explained)
https://knowledge.broadcom.com/external/article/97954

Don't set ATTACH queues manually; for greater message throughput, set the bulk size for GET queues only.