OI_connector doesn't work on failback using HA
search cancel

OI_connector doesn't work on failback using HA

book

Article ID: 436695

calendar_today

Updated On:

Products

DX Unified Infrastructure Management (Nimsoft / UIM) CA Unified Infrastructure Management On-Premise (Nimsoft / UIM) CA Unified Infrastructure Management SaaS (Nimsoft / UIM)

Issue/Introduction

We have configured a Secondary Hub with HA, including the setup of apm_bridge and oi_connector.
 
Failover to the secondary hub works as expected, and metrics and alarms are successfully sent to DXO2 without data loss. However, when we failback to the primary hub, the oi_connector encounters a cache issue and fails to find met_ids. As a result, it stops sending metrics to DXO2.
 
Currently, this is resolved by manually restarting the oi_connector. This behavior is unexpected, as the oi_connector should function correctly upon the primary hub coming back online.

Environment

  • DX UIM Version: 23.4.x
  • Component (Probe) version:
    • oi_connector is 2.02
    • apm_bridge is 2.01
    • ha probe 20.50

Cause

The problem was a race condition during HA failback where:

  • The probe doesn't fully restart – During failback, the probe may receive a "soft restart" signal rather than a full JVM restart
  • Static CacheManager singleton persists – The CacheManager is a static singleton that may not be recreated
  • Cache is not rebuilt – The clearCache() and buildCache() sequence may not execute properly
  • Queue processing starts immediately – Metrics/alarms start flowing before cache is ready
  • Evidence from Code-> Looking at CacheUtil.getCacheManager():

Resolution

oi_connector-probe-2.1.0-20260410.095441-23.zip

Please note that this build is based on OI Connector 2.10, so the minimum required UIM version is 23.4 CU5.

Apply this OI Connector test build on both the Primary Hub and the HA Hub?

After deploying the build, create two new keys in the <setup> section using Raw Configure:

  1. oi_connector_restart = true
    This enables the probe to automatically restart itself after a fallback or failover event. Please ensure this is set to true.

  2. oi_connector_restart_delay = 5
    This configures the delay before the restart occurs (in minutes). Please keep it at 5 minutes for now. If we determine later that the environment stabilizes faster, we can decrease this value.

 

Attachments

oi_connector-probe-2.1.0-20260410.095441-23.zip get_app