Basic troubleshooting of the OI connector probe when UIM Alarms are arriving late to DX OI
search cancel

Basic troubleshooting of the OI connector probe when UIM Alarms are arriving late to DX OI

book

Article ID: 204744

calendar_today

Updated On:

Products

DX Operational Intelligence DX Unified Infrastructure Management (Nimsoft / UIM)

Issue/Introduction

Basic troubleshooting can be made when dealing with a situation where the UIM alarms are getting to DX OI but with a delay or simply not arriving at all.

Environment

UIM Release : 20.x, 23.x

Component : CA DOI OI CONNECTOR

 

Resolution

In order to perform a basic troubleshooting of the alarms propagated from UIM to DX OI, it is worth checking the probe logs for some specific keywords that may help to determine what step in the process is causing an issue.

Note that some settings need to be first set in the oi_connector configuration.

1. Set the following parameters in oi_connector.cfg:

   loglevel = 3  (a higher value may cause too much information, making the specific troubleshooting for alarms and QoS difficult)

   logsize = 200000 (or any size that allows capturing enough information during the alarm processing) 

2. Make sure there are UIM alarms to be propagated to DX OI.

3. Wait for 10 minutes so that at least a couple of cycles are completed.

4. Open the oi_connector.log file under Nimsoft/probes/gateway/oi_connector

5. Look for the following keywords in the log file:

 

- CI_CACHE_THREAD

e.g.

   [CI_CACHE_THREAD, oi_connector] Start building CI Cache...

   ...

   [CI_CACHE_THREAD, oi_connector] Building CI Cache finished  Total time taken is 307788 milliseconds..Total size of cache is 1753534

==> When the oi_connector probe starts, a cache list of all the devices is created in order to be used during the alarm and QoS processing. These two messages indicate the start and end of the cache build.

 

- STATS_THREAD

e.g.

[STATS_THREAD-1, oi_connector] Memory(MB) - Max: 10923, Allocated: 9822, Used: 2976
[STATS_THREAD-1, oi_connector] CI Cache size: 1733560
[STATS_THREAD-1, oi_connector] Nass Metric Id cache size: 240826
[STATS_THREAD-1, oi_connector] Total QoS Ingested: 269059
[STATS_THREAD-1, oi_connector] QoS Ingested in last 5 mintues: 269059

==> Summary of statistics gathered every 5 minutes

 

- "Total Time Taken by Thread in AlarmEventProcessor" ,  without quotes

e.g

 [ALARM_PROCESSOR_THREAD-2, oi_connector] Total Time Taken by Thread in AlarmEventProcessor :: 579 ms. [ Total Records :: 52 ]

==> total time taken to process a certain amount of alarms. A value of around a second, 1000 ms, is expected. Much longer times may be a symptom of an issue.

 

- "Total Time Taken by Thread in QosEventProcessor"  ,  without quotes

e.g

 [QOS_PROCESSOR_THREAD-15, oi_connector] Total Time Taken by Thread in QosEventProcessorNASS :: 418 ms. [ Total Records :: 2000 ]

==> total time taken to process a certain amount of QoS messages. A value of around a second, 1000 ms, for a full bucket size is expected. Much longer times may be a symptom of an issue.

- ALARM_PROCESSOR_THREAD

Specific messages gathered during the processing of alarms

 

- "total no of Qos values "  without quotes

e.g.

  [QOS_PROCESSOR_THREAD-7, oi_connector] Total no of Qos values posted on NASS via post : 1688

==> number of QoS messages correctly sent to Jarvis (NASS)

 

- "total no of Metrics Metadata"  without quotes

e.g.

  [ALARM_PROCESSOR_THREAD-41, oi_connector] Total no of Metrics Metadata posted on NASS via post : 12

==> number of Metrics Metadata correctly sent to Jarvis (NASS)

 

- "total no of UIM alarms"  without quotes

e.g.

  [ALARM_PROCESSOR_THREAD-1, oi_connector] Total no of UIM Alarms posted on jarvis via post : 129

==> number of alarms correctly sent to Jarvis (NASS)

Additional Information

If any error appears in logs related to ALARM_PROCESSOR_THREAD open a case with Broadcom Support.