oi_connector probe is very unstable and not processing messages to AIOps
search cancel

oi_connector probe is very unstable and not processing messages to AIOps

book

Article ID: 237365

calendar_today

Updated On:

Products

CA Unified Infrastructure Management On-Premise (Nimsoft / UIM) CA Unified Infrastructure Management SaaS (Nimsoft / UIM) DX Unified Infrastructure Management (Nimsoft / UIM)

Issue/Introduction

We have observed that the OI_Connector is very unstable. We have observed that at times the oi_connector does not pass messages to AIOps and we have to restart the oi_connector and apm_bridge to resolve the issue.

Environment

  • Release: 20.3 or higher
  • Component: UIM - OI_CONNECTOR
  • oi_connector v1.52
  • apm_bridge 1.09

Cause

  • May occur when 1 or more purestorage probes are enabled in the customer's environment.
  • Need for configuration tuning

Resolution

Summary of configuration changes

oi_connector
Increased java min/max memory to 12GB/14GB
task_count = 2000
payload_bulk_size = 1000
qos_bulk_size set to 3000
java min/max set to 13G and 15G respectivelybulk_size = 500
 
data_engine
hub_bulk_size set from 2000 to 1750
thread_count_insert set from 12 to 24 (best practice).
 
After setting the value to 24, at peak message count we saw a throughput avg of 4-5k per sec increase to ~20000 per second when it was needed.

qos_processor

Error in log:
[Qos Monitor Enricher #5, qos_processor] Failed to enqueue pending qos object, queue capacity exceeded. The update will be made when memory is released.
No data will be lost. S_QOS_DATA { qos_name: QOS_PROCESS_CPU, source: xxxxxxx, target: <target>, nim_origin: YOUR_ORIGIN, origin: YOUR_ORIGIN, modifier: nimsoft,
host: #.#.#.#, robot: <robot_name>, probe: processes, table_id: null, qos_def_id: null } 

So we added the key:

database-update-queue-capacity and set it to 50000
java heap memory min/max set to 8GB/10GB respectively

oi_connector loglevel

After setting the oi_connector loglevel back down from 5 (debug) to 1 (fatal error messages only)  it allowed the queue to function more efficiently again and sent all of the messages and queued messages remained within a range of 0 and 2000 messages queued at any given time which is reasonable.

The QOS Message queue attached to OI Connector was then stable, and able to process the messages quickly and efficiently.

Additional Information

oi_connector probe axagateway.uimQos queue continues processing for several days or up to 2 weeks but when the purestorage probe is enabled, it starts to backup and cannot keep up.

apm_bridge:

Every time topology is created or updated it creates a file. A file is saved in the apm_bridge cache folder. If these files build up, while CPU/Memory usage slowly increases, the cache files can be deleted because they are only used temporarily

1. Deactivate the apm_bridge probe
2. Rename/delete the cache & store folder within apm_bridge. Cache folder will have a large number of 1 KB files in most deployments so deleting will take a very long time, so renaming the folder would work.
3. Activate the probe