oi_connector generates repeated "UIM QoS Queue is Down" alarms.
search cancel

oi_connector generates repeated "UIM QoS Queue is Down" alarms.

book

Article ID: 439894

calendar_today

Updated On:

Products

DX Unified Infrastructure Management (Nimsoft / UIM)

Issue/Introduction

In DX UIM environments integrated with DX Operational Intelligence (DXO2), the oi_connector probe may generate repeated "UIM QoS Queue is Down" alarms. Additionally, users may observe that "CLEAR" events for existing alarms fail to propagate to DXO2, leading to inconsistent alarm states between the two platforms.

Environment

  • DX UIM 23.4 or higher
  • oi_connector v2.03
  • apm_bridge v2.03
  • Large-scale environments (e.g., ~100k metadata records or >1.2M metric definitions)

Cause

This issue is typically caused by queue congestion and thread starvation under heavy load. In large environments, the default configuration for the configuration item (CI) cache refresh is too frequent. When the INIT_THREAD becomes blocked while processing a massive volume of metadata from the cm_configuration_item and cm_configuration_item_definition tables, it prevents the probe from maintaining active queue subscriptions and processing alarm updates efficiently

Resolution

The following configuration adjustments can improve thread availability and ensure reliable alarm synchronization:

 

  • Optimize Cache Update Intervals: 
    Modify the frequency of metadata updates to reduce the load on the database and the probe's internal threads.
    • Access the oi_connector configuration.
    • Locate ci_cache_update_thread_interval_minutes and set it to 1440.
    • Locate schedule_time_minutes and set it to 1440.

 

  • Adjust Metadata Retention: 
    Limit the scope of metadata processing to active items to prevent the cache from becoming over-indexed.
    • Set get_ci_details_alive_time_days to 2.

 

  • Configure Diagnostic Logging: 
    Enable specific diagnostics to monitor the health of the alarm synchronization process.
    • Set enable_alarm_diagnostics to true.

 

  • Tune Queue Monitoring Sensitivity: 
    Adjust the monitor interval to avoid false "Queue is Down" alarms caused by transient network or processing delays.
    • Set uim_queue_monitor_interval_seconds to 30 or 60.

 

  • Restart the Probe 
    Deactivate and then reactivate the oi_connector probe to apply the new configuration. In some instances, restarting the robot hosting the probe is necessary to ensure all stale connections are cleared.

Additional Information

Related KB: oi_connector and apm_bridge setup and best practices