Basic troubleshooting can be made when dealing with a situation where the UIM alarms are getting to DX OI but with a delay or simply not arriving at all.
UIM Release : 20.x, 23.x
Component : CA DOI OI CONNECTOR
In order to perform a basic troubleshooting of the alarms propagated from UIM to DX OI, it is worth checking the probe logs for some specific keywords that may help to determine what step in the process is causing an issue.
Note that some settings need to be first set in the oi_connector configuration.
1. Set the following parameters in oi_connector.cfg:
loglevel = 3 (a higher value may cause too much information, making the specific troubleshooting for alarms and QoS difficult)
logsize = 200000 (or any size that allows capturing enough information during the alarm processing)
2. Make sure there are UIM alarms to be propagated to DX OI.
3. Wait for 10 minutes so that at least a couple of cycles are completed.
4. Open the oi_connector.log file under Nimsoft/probes/gateway/oi_connector
5. Look for the following keywords in the log file:
- CI_CACHE_THREAD
e.g.
[CI_CACHE_THREAD, oi_connector] Start building CI Cache...
...
[CI_CACHE_THREAD, oi_connector] Building CI Cache finished Total time taken is 307788 milliseconds..Total size of cache is 1753534
==> When the oi_connector probe starts, a cache list of all the devices is created in order to be used during the alarm and QoS processing. These two messages indicate the start and end of the cache build.
- STATS_THREAD
e.g.
[STATS_THREAD-1, oi_connector] Memory(MB) - Max: 10923, Allocated: 9822, Used: 2976
[STATS_THREAD-1, oi_connector] CI Cache size: 1733560
[STATS_THREAD-1, oi_connector] Nass Metric Id cache size: 240826
[STATS_THREAD-1, oi_connector] Total QoS Ingested: 269059
[STATS_THREAD-1, oi_connector] QoS Ingested in last 5 mintues: 269059
==> Summary of statistics gathered every 5 minutes
- "Total Time Taken by Thread in AlarmEventProcessor" , without quotes
e.g
[ALARM_PROCESSOR_THREAD-2, oi_connector] Total Time Taken by Thread in AlarmEventProcessor :: 579 ms. [ Total Records :: 52 ]
==> total time taken to process a certain amount of alarms. A value of around a second, 1000 ms, is expected. Much longer times may be a symptom of an issue.
- "Total Time Taken by Thread in QosEventProcessor" , without quotes
e.g
[QOS_PROCESSOR_THREAD-15, oi_connector] Total Time Taken by Thread in QosEventProcessorNASS :: 418 ms. [ Total Records :: 2000 ]
==> total time taken to process a certain amount of QoS messages. A value of around a second, 1000 ms, for a full bucket size is expected. Much longer times may be a symptom of an issue.
- ALARM_PROCESSOR_THREAD
Specific messages gathered during the processing of alarms
- "total no of Qos values " without quotes
e.g.
[QOS_PROCESSOR_THREAD-7, oi_connector] Total no of Qos values posted on NASS via post : 1688
==> number of QoS messages correctly sent to Jarvis (NASS)
- "total no of Metrics Metadata" without quotes
e.g.
[ALARM_PROCESSOR_THREAD-41, oi_connector] Total no of Metrics Metadata posted on NASS via post : 12
==> number of Metrics Metadata correctly sent to Jarvis (NASS)
- "total no of UIM alarms" without quotes
e.g.
[ALARM_PROCESSOR_THREAD-1, oi_connector] Total no of UIM Alarms posted on jarvis via post : 129
==> number of alarms correctly sent to Jarvis (NASS)
If any error appears in logs related to ALARM_PROCESSOR_THREAD open a case with Broadcom Support.