axagateway.uimQos queue build up | oi_connector queues growing yellow not processing
search cancel

axagateway.uimQos queue build up | oi_connector queues growing yellow not processing

book

Article ID: 433643

calendar_today

Updated On:

Products

DX Unified Infrastructure Management (Nimsoft / UIM) DX Operational Intelligence CA Unified Infrastructure Management On-Premise (Nimsoft / UIM) CA Unified Infrastructure Management SaaS (Nimsoft / UIM)

Issue/Introduction

This article provides solution for the following scenarios related to oi_connector queues: 

  • Integrating DX UIM and DXO2 the queues that synch data between UIM and O2 via oi_connector are stuck and not draining. 
  • axagateway.uimQos and axagateway.alarms queues are yellow / stuck / not draining / disconnections / instability 
  • oi_connector probe is unable to subscribe to the axagateway queues (Alarms and QoS), resulting in no data being published to DXO2.
  • The oi_connector queues are unstable and they fail to connect
  • DX-OI - UIM connector filling up Nimsoft queues






Errors below may be seen in oi_connector.log:

Feb 24 12:46:47:600 [QUEUE_MONITOR_THREAD, oi_connector] QOS Subscription is either null or not Ok so Reconnecting to hub queue...
Feb 24 12:46:47:601 [QUEUE_MONITOR_THREAD, oi_connector] QOS subscription is unavailable on Primary Hub
Feb 24 12:46:47:602 [QUEUE_MONITOR_THREAD, oi_connector] Primary Hub is available: /<domain>/<hub>/<robot>/hub
Feb 24 12:46:47:602 [QUEUE_MONITOR_THREAD, oi_connector] subscribe to queue hub address is  /<domain>/<hub>/<robot>/hub
Feb 24 12:46:47:602 [QUEUE_MONITOR_THREAD, oi_connector] inside subscribe to queue hub address is  /<domain>/<hub>/<robot>/hub
Feb 24 12:46:48:602 [QUEUE_MONITOR_THREAD, oi_connector] Queue is not subscribe, Nass is not available : axagateway.uimQos
Feb 24 12:46:48:602 [QUEUE_MONITOR_THREAD, oi_connector] New subscriber object constructed for :  Queue[axagateway.uimQos].
Feb 24 12:46:48:657 [attach_clientsession, oi_connector] Retaining Data in the QoS Queue, as effectiveTaskCount reaches to max limit.

Additional information on the scenario: 

  • When the queue is flushed, reconnected or deleted it may never connect 
  • When the queue is deleted and oi_connector restarted, it might not recreate the queue automatically as it is supposed to do
  • The queues have issues despite Jarvis and NASS being reachable (HTTP 200 OK responses confirmed in the logs), but probe is not processing any alarms or QoS metrics.

Environment

  • DX UIM 23.4.*
  • oi_connector 2.0*
  • apm_bridge 2.0*

Cause

Possible Causes:

  • Probe version/currency: Newer version of oi_connector contains several fixes for performance improvement and bug fixes. 

  • Configuration issues / configuration changes: loglevel setting can contribute to high load on the probes.

  • Database issues In large-scale environments: managing over 1.2 million metric definitions, the database experiences performance bottlenecks because the INIT_THREAD becomes blocked for extended periods (e.g., 35 minutes) due to a conflict where the CI cache refresh interval is configured too frequently (every 30 minutes) to successfully process the massive volume of metadata residing in the cm_configuration_item and cm_configuration_item_definition tables.

Resolution

Possible Resolutions:

  • Probes Version: Ensure you are running the latest version of oi_connector and apm_bridge as per oi_connector and apm_bridge setup and best practices

  • Loglevel settings: loglevel set to 5 on both the apm_bridge and oi_connector may cause instability. If not needed for troubleshooting, reduce to 3 and verify. Further reduce to 1 if needed.

  • Recent Configuration Changes: In some cases, queuing or queue instability can happen after a configuration change, and or update to the probe. If a simple deactivate/activate does not help to get the queues going again, restart the entire robot/hub (right click the hub robot -> restart in IM) where the oi_connector probe is running. This usually clear up the queues and reconnects them. 


  • Database issues in large scale environments:

    Update the following parameter:

    ci_cache_update_thread_interval_minutes: change from 30 to 1440

    This configuration change should help reduce DB load.

  • System resources:

    Also note that the axagateway.uimQos queue processing can benefit from the addition of more virtual processors on the system in cases where the probe may be having difficulty with qos event processing, and/or throwing errors such as:

    [QOS_PROCESSOR_THREAD-337, oi_connector] Error while posting the qos data net.sf.ehcache.CacheException: Faulting from repository failed

Additional Information

Additional questions and Answers about queue instability:

  • How can this database performance scenario be prevented? Although log level 5 is enabled on the probe, did not find any specific details related to the issue. Is there any configuration or setting at the probe level that can help capture more detailed logs for better diagnosis?

    When this issue occurs - Queue disconnect for example, alarms are generated for this event. However in a case of Database issue due to thread starvation, the alarm itself was not generated.

    Enhancements in logging are expected to come in upcoming releases so that thread states and specific thread-related details can be printed more explicitly for better diagnostics.


  • Is my environment a "Large-Scale Environment"?

    It is considered large scale if your database contains a large volume of metadata (over 1.2 million metric definitions). These are not QoS messages but metadata entries from tables such as cm_configuration_item_definition and cm_configuration_item. This indicates a large-scale/high-end environment.


  • My thread dump is pointing to a count of DB metrics of around 1,5 M. Is this normal?

    Counts of 1,500,000 entries are not abnormal. These are metadata definitions (not QoS data), sourced from:

    cm_configuration_item_definition
    cm_configuration_item

    These volumes reflect the size and complexity of the environment.

 



Related KBs/Documents: