oi_connector probe is very unstable and not processing messages to AIOps
search cancel

oi_connector probe is very unstable and not processing messages to AIOps

book

Article ID: 237365

calendar_today

Updated On:

Products

CA Unified Infrastructure Management On-Premise (Nimsoft / UIM) CA Unified Infrastructure Management SaaS (Nimsoft / UIM) DX Unified Infrastructure Management (Nimsoft / UIM)

Issue/Introduction

We have observed that the i_connecter is very unstable. We have observed that at times the oi_connector does not pass messages to AIOps and we have to restart the oi_connector and apm_bridge to resolve the issue.

Environment

  • Release: 20.3 or higher
  • Component: UIM - OI_CONNECTOR
  • oi_connector v1.52 or higher
  • apm_bridge 1.09

Cause

  • May occur when 1 or more purestorage probes are enabled in the customer's environment.
  • Need for configuration tuning

Resolution

Summary of configuration changes

oi_connector
Increased java min/max memory to 12GB/14GB
task_count = 2000
payload_bulk_size = 1000
qos_bulk_size set to 3000
java min/max set to 13G and 15G respectively
bulk_size = 500
 
data_engine
hub_bulk_size set from 2000 to 1750
thread_count_insert set from 12 to 24 (best practice).
 
After setting the value to 24, at peak message count we saw a throughput avg of 4-5k per sec increase to ~20000 per second when it was needed.

qos_processor

Error in log:
[Qos Monitor Enricher #5, qos_processor] Failed to enqueue pending qos object, queue capacity exceeded. The update will be made when memory is released.
No data will be lost. S_QOS_DATA { qos_name: QOS_PROCESS_CPU, source: xxxxxxx, target: <target>, nim_origin: YOUR_ORIGIN, origin: YOUR_ORIGIN, modifier: nimsoft,
host: #.#.#.#, robot: <robot_name>, probe: processes, table_id: null, qos_def_id: null } 

So we added the key:

database-update-queue-capacity and set it to 50000
java heap memory min/max set to 8GB/10GB respectively

oi_connector loglevel
After setting the oi_connector loglevel back down from 5 (debug) to 1 (fatal error messages only)  it allowed the queue to function more efficiently again and sent all of the messages and queued messages remained within a range of 0 and 2000 messages queued at any given time which is reasonable.

The QOS Message queue attached to oi_connector was then stable, and able to process the messages quickly and efficiently.

Additional Information

oi_connector probe axagateway.uimQos queue continues processing for several days or up to 2 weeks but when the purestorage probe is enabled, it starts to backup and cannot keep up.

apm_bridge:

Every time topology is created or updated it creates a file. A file is saved in the apm_bridge cache folder. If these files build up, while CPU/Memory usage slowly increases, the cache files can be deleted because they are only used temporarily

1. Deactivate the apm_bridge probe
2. Rename/delete the cache & store folder within apm_bridge. Cache folder will have a large number of 1 KB files in most deployments so deleting will take a very long time, so renaming the folder would work.
3. Activate the probe