DX OI integration with UIM - Troubleshooting
search cancel

DX OI integration with UIM - Troubleshooting

book

Article ID: 207711

calendar_today

Updated On:

Products

DX Operational Intelligence DX OI SaaS

Issue/Introduction

The following is a list of techniques and suggestions to employ when troubleshooting OI to UIM integration issues

A) Checklist

B) What files should I collect for Broadcom Support?

Environment

DX Operational Intelligence 2x, SaaS

Resolution

 
 

STEP # 2 : Check UIM Probes configuration

 


a) Go to "Setup", make sure the below properties are set:

ingest_data_to_nass = true 
ingest_uim_metrics_metadata_to_jarvis = false 


b) If you have upgraded the oi_connector probe to 1.38 onwards, make sure to set "subscribe_to_uim_inventory_ci=no" (if the property is available) : inventory ingestion should be disabled since inventory is now sent to TAS only (through apm_bridge). 


c) If you have multiple Tenants:

Go to "Resource | Properties", set nassTassToken = <your tas_tenant_id>

 

How to obtain the "tas_tenant_id"?

Option 1: Login to APM, you can obtain tas_tenant_id from the URL as below:

 

Option2: If the setup doesn't have APM, you can obtain the tas_tenant_id from Cluster Manager, go to Tenant Services, move your mouse over the tenant name, tas_tenant_id is the "Internal id".  If you are using DX OI SaaS, contact Broadcom Support for assistance.

 

4) Review the oi_connector log

Default location : C:\Program Files (x86)\Nimsoft\probes\gateway\oi_connector\oi_connector.log

a) Make sure there are no errors or exceptions, you can find below some example :

- Incorrect nginx endpoint (jarvis ingestion endpoint):

Feb 02 22:58:49:203 [INIT_THREAD, oi_connector] Exception in validateResponseFromJarvis : java.net.UnknownHostException: 
..
Feb 02 22:58:49:713 [GROUP_PROCESSOR_THREAD-1, oi_connector] Cannot post UIM Group data to Jarvis, either payload is blank or DOI Configurations are not valid.
..
Feb 02 23:18:17:971 [QUEUE_MONITOR_THREAD, oi_connector] Jarvis is unavailable. Alarm, Inventory and Group Queues status is not monitored

 

- Incorrect NASS endpoint (apmservices-gateway) or token: 

Feb 02 22:39:03:619 [DEVICE_COUNT_METRICS_PROCESSOR-1, oi_connector] Exception while executing doPost request to Jarvis url:  java.net.UnknownHostException: 
 at java.net.InetAddress.getAllByName0(InetAddress.java:1281)
..

Feb 02 22:39:03:619 [DEVICE_COUNT_METRICS_PROCESSOR-1, oi_connector] Response is null for posting Metrics Metadata to NASS, please verify nass url and token.
..
Feb 02 22:40:00:107 [DOI_MONITOR_THREAD, oi_connector] Nass is Down. 0

Recommendation: Generate a New Token.

 

b) Search for common words to help you confirm that the OI Connector is working correctly:

Nass is Available”, "Jarvis is Available", "posted on jarvis", "posted on NASS"

Here are some examples:

Feb 02 22:28:51:926 [INIT_THREAD, oi_connector] Jarvis is Available. 202
..
Feb 02 22:28:52:552 [INIT_THREAD, oi_connector] Nass is Available. 200
..
Feb 02 22:28:56:253 [GROUP_PROCESSOR_THREAD-1, oi_connector] Total no of UIM Groups posted on jarvis via post : 6
Feb 02 23:20:56:161 [ALARM_PROCESSOR_THREAD-1, oi_connector] Total no of UIM Alarms posted on jarvis via post : 7
Feb 02 23:22:34:432 [QOS_PROCESSOR_THREAD-4, oi_connector] Total no of Qos values posted on NASS via post : 3

 

c) Verify axagateway queues:

Option1: Using Dr NimBus ("C:\Program Files (x86)\Nimsoft\bin\drnimbus.exe")

Check “In Queue” column, if value is constantly > 0 it indicates a  problem posting the data to OI
Check “Count” column, if value is > 0 it indicates that data has been posted correctly to OI


Option 2
: Using Hub Status,
Open the hub probe, click "Status" tab

Check “Queued” column, if value is constantly > 0 it indicates a  problem posting the data
Check “Sent” column, if value is > 0 it indicates that data has been posted correctly to OI

 

d) Enable 5-TRACE logging for advance troubleshooting ,open the "Raw Configure > Setup" and change the logsize temporally to 999999.

IMPORTANT: Once you complete the troubleshooting reset the log level back to 3-Info and logsize to 10240

You can also, open the "Raw Configure > Setup" and change the logsize temporally to 999999, once you complete the troubleshooting reset it to the original value.

 

apm_bridge 

1) Make sure you are using a supported version of the apm_bridge and webservicesrest probes: 

https://support.broadcom.com/web/ecx/support-content-notification/-/external/content/release-announcements/DX-Operational-Intelligence-Interoperability/18129

 

NOTE: You can check webservicesrest version from IM as below:


2)
Ensure UIMAPI has been deployed and is available

If UMP: http(s)://<UMP_url>:<UMP_port>/uimapi/docs/index.html

If OC: http(s)://<OC>:<OC_port>/uimapi


For example:

OC:

UMP:


3)
Verify apm_bridge and uimapi probe configuration is correct

https://techdocs.broadcom.com/us/en/ca-enterprise-software/it-operations-management/ca-unified-infrastructure-management-probes/GA/monitoring/extensibility-and-integrations/apm-bridge-ca-apm-bridge/apm-bridge-configuration.html


4) Review the apm_bridge.log log:

Default location :C:\Program Files (x86)\Nimsoft\probes\service\apm_bridge\apm_bridge.log


a) Make sure there are no errors or exceptions, you can find below some example :

Feb 01 01:14:55:772 ERROR [ATS Client - 11019, apm_bridge] Could not POST to TAS NG. profile id: 0, host: 
org.apache.http.client.HttpResponseException: 

Recommendation: reinstall apm_bridge


b) Search for common string to help you confirm that the configuration is correct and data is getting ingested into TAS (apmservices-gateway):

"Inventory sent to APM for profile"

Here are some examples:

Feb 01 09:21:40:521 INFO  [ForkJoinPool-57-worker-1, apm_bridge] Inventory sent to APM for profile 0 : CS vertices: 2 / CI vertices: 0 and edges: 0
Feb 01 09:21:40:521 INFO  [ForkJoinPool-57-worker-1, apm_bridge] Inventory sent to APM for profile 0 : CI vertices: 2 and edges: 0
Feb 01 09:21:40:525 INFO  [ForkJoinPool-57-worker-1, apm_bridge] Inventory sent to APM for profile 0 : APM vertices: 2 and edges: 0
Feb 01 09:21:40:525 INFO  [ForkJoinPool-57-worker-1, apm_bridge] Ending APM Inventory Update for profile 0 / took 2003 ms to process
Feb 01 09:21:40:525 INFO  [APMInventoryService RUNNING, apm_bridge] Done updating APM Inventory


5) Enable additional logging:

Edit the apm_bridge.cfg configuration file with a text editor, or use Raw Configure to update key/value pairs. You can update or add the following key/ value pairs to enable additional logs and for troubleshooting:


<topology_service>      
save_incoming_graphs = 1    
log_outgoing_request = 1      
publish_config_items = 0  
</topology_service>  

After restart of the apm_bridge proble, a new "debug" directory will be created that includes subfolder, each will contain json files of the inventory and topology data.

 

STEP # 3 : Check DX OI

a) Check Alarms in UI (stored in ElasticSearch)

Go to DX OI > Alarms

 

b) Check for Metrics (stored in NASS)

Go to DX OI > Performance
Click Done
Click Metrics
Select some metrics to display



c) Check for Inventory (stored in TAS)

Go to DX OI > Services > Create a new Service

From Add Elements,, select Infrastructure > UIM Group or Hostname, you should be able to see your UIM servers and groups, below an example:



d) Check for data using DX Dashboards

Onpremise:

https://techdocs.broadcom.com/us/en/ca-enterprise-software/it-operations-management/dx-platform-on-premise/21-3/Self-Service-Dashboards/Video-Resources-for-DX-Dashboards.html

SaaS:

https://techdocs.broadcom.com/us/en/ca-enterprise-software/it-operations-management/dx-dashboards/SaaS.html

 

 

** Below options ares valid for On Premise ONLY, if you are using DX OI SaaS, contact Broadcom Support for assistance **


e) Check for Alarms using RESTAPI

Verify that data (metrics, alerts, inventory) has been ingested in the  respective Product indices:

1) List all the UIM product indices:

http://es.<servername>/_cat/indices/*uim*?v

For example:

http://es.munqa001493.bpc.broadcom.net/_cat/indices/*uim*?v

Check that doc.count and size columns values increases over the time.


2) Check the content of a specific index:

http://es.<severname>/<index-name>/_search?pretty&sort=timestamp:desc&size=500

For example:

http://es.munqa001493.bpc.broadcom.net/*alarms_all*/_search?pretty&sort=@timestamp:desc&size=200


NOTE
: You can use https://www.epochconverter.com/ to convert epoch to human-readable date and vice versa, 



For more ElasticSearch queries and examples, see:

DX AIOps - ElasticSearch Queries
https://knowledge.broadcom.com/external/article/207215

 

f) Check for Inventory using REST API 

Open Postman (you can download postman from https://www.postman.com/downloads/)

POST API End Point to check TAS data for UIM inventory: 

http://<APMServices Gateway Host>/tas/graph/query

For example:

http://apmservices-gateway.munqa001493.bpc.broadcom.net/tas/graph/query

Headers:

Content-Type: application/json

Authorization: Bearer <Tenant Token>

Body:

{
   "filter": {
       "op": "JOIN",
       "input": {
           "op": "AND",
           "input": [
               {
                   "op": "ATTRIBUTE",
                   "expressions": [
                       {
                           "name": "Product",
                           "values": [
                               "UIM"
                           ]
                       }
                   ]
               }
           ]
       }
   },
   "universe": null,
   "version": null,
   "time": 0,
   "stitchingEnabled": true,
   "includeStatus": true
}

Expected Result: you  should see all new vertices added to TAS

 

g) Check for Metrics using REST APIs

Open Postman (you can download postman from https://www.postman.com/downloads/)

POST API End Point to check NASS Metric Metadata matching a pattern

http://<APM Service Gateway Host>/metadata/queryMetric

For example:

http://apmservices-gateway.munqa001493.bpc.broadcom.net/metadata/queryMetric

Headers:

Content-Type: application/json

Authorization: Bearer <Tenant Token>

Body:

{
   "size": 20000,
 "specifier": {
   "op": "SPEC",
   "sourceNameSpecifier": {
     "op": "REGEX",
     "pattern": "(.*)UIM(.*)"
   },
   "attributeNameSpecifier": {
     "op": "ALL"
   }
 }
}

Expected Result: you should see all new vertices added to NAS

 

STEP # 4 : Check Data Platform

** This section is valid for DX On Premise 2x version only, if you are using DX OI SaaS, contact Broadcom Support for assistance **

AIOps - Jarvis (kafka, zookeeper, elasticSearch) Troubleshooting



C) WHAT FILES SHOULD I COLLECT FOR BROADCOM SUPPORT?

If you still need assistance, contact Broadcom Support (https://support.broadcom.com/) and provide the below information:

a) Screenshots illustrating the discrepancy between UIM and DX OI, If the problem is related to difference is metrics and values

b) oi_connector and apm_bridge probe logs

If possible enable TRACE logging level from the probes, restart, reproduce the issue and gather below logs:

C:\Program Files (x86)\Nimsoft\probes\gateway\oi_connector/logs/*
C:\Program Files (x86)\Nimsoft\probes\gateway\oi_connector/ oi_connector.cfg

C:\Program Files (x86)\Nimsoft\probes\service\apm_bridge\apm_bridge.log
C:\Program Files (x86)\Nimsoft\probes\service\apm_bridge\apm_bridge.cfg



If you are using DX OI On Premise (20.x)
: collect the below additional information:

c) cluster and pods status:

kubectl get pods -n<dxi-namespace>
kubectl describe nodes -n<dxi-namespace>

d) From ElasticSearch

Collect the result of the below queries:

http(s)://{elasticsearch_endpoint}/_cat/indices/*uim*?v
http(s)://{elasticsearch_endpoint}/_cat/indices/?v&s=ss:desc&h=health,store.size,pri.store.size,pri,rep,store.size,pri.store.size,docs.count,docs.deleted,index,cds
http(s)://{elasticsearch_endpoint}/_cluster/health?pretty&human

From all Elastic server and NFS server, collect the result of : df -h

e) If the problem is related to Jarvis, Kafka or Elastic, collect the respective logs and evidences, see:

DX OI - Jarvis / Kafka Troubleshooting, Common Issues and Best Practices
https://knowledge.broadcom.com/external/article/189119



Additional Information

https://knowledge.broadcom.com/external/article/190815/aiops-troubleshooting-common-issues-and.html#mcetoc_1fsjgr2vp0