DX OI integration with UIM - Troubleshooting, Common Issues and Best Practices

book

Article ID: 207711

calendar_today

Updated On:

Products

DX Operational Intelligence

Issue/Introduction

The following is a list of techniques and suggestions to employ when troubleshooting OI to UIM integration issues

A) Checklist

B) What files should I collect for Broadcom Support?

Environment

DX Operational Intelligence 20.x

Resolution

INTRODUCTION:

To successfully enable the UIM integration with OI, you need to properly configure the oi_connector and apmbridge probes to connect to each of the below data stores: 

- Nginx (Jarvis/Elastic) to store Alarms
- TAS  to store the inventory and topology data
- NASS to stores the metrics

A) CHECKLIST

STEP #1 : Verify the UIM Probes configuration (OIConnector and APM_bridge)
STEP #2 : Check the Alarms, Metrics, Inventory and Topology data using DX OI UI
STEP #3 : Check the Alarms, Metrics, Inventory and Topology data using Elastic and TAS/NAS REST APIs
STEP #4 : Verify Jarvis, Zookeeper and Kafka


STEP#1 Verify the UIM Probes configuration (OIConnector and APM_bridge)

 

Probe #1: OI Connector 

1) Make sure you are using a supported version of DX NetOps PM, CA ADA, and CA NFA

https://techdocs.broadcom.com/us/en/ca-enterprise-software/it-operations-management/digital-operational-intelligence/20-2/release-notes/Compatibility-Matrix.html

2) Ensure the oi_connector configuration is correct:

Default Tenant ID

If you are using DX OI On Premise (20.x):
the "tenant_id" can be obtain from Elastic using below query:
http://<elastic-endpoint>/ao_dxi_tenants_1_1/_search?size=200&pretty

NOTE: How to obtain the "elastic-endpoint" ?

If DX OI 20.x:

If Kubernetes:  kubectl get ingress -n<dxi-namespace> | grep jarvis

for example: kubectl get ingress -ndxi | grep jarvis
jarvis-es                             <none>   es.10.109.32.88.nip.io                           10.109.32.88   80      19d

If Openshift:     oc get routes -n<dxi-namespace> | grep jarvis

for example: oc get routes -ndxi | grep jarvis
jarvis-es-7krrv                             es.munqa001493.bpc.broadcom.net                           /                  jarvis-elasticsearch-lb       9200                                          None

If DX OI 1.3.2:

oc get routes -n<dxi-namespace> | grep elastic

for example: oc get routes -ndoi132 | grep elastic

es-route-9200                  es.lvntest010772.bpc.broadcom.net                                    elasticsearch         9200                    None


If you are using DX OI SaaS:
contact Broadcom Support to obtain your TenantID

Jarvis Ingestion Endpoint (nginx)

If you are using DX OI On Premise (20.x):
the nginx servername can be obtain using below commands:

Openshift    : oc get routes -ndxi| grep nginx
Kubernetes : kubectl get ingress -ndxi | grep nginx

If you are using DX OI SaaS:
https://api.dxi-na1.saas.broadcom.com/ingestion

NASS url (apmservices-gateway)

If you are using DX OI On Premise (20.x):
the apmservices-gateway servername can be obtain using below commands:

Openshift    : oc get routes -ndxi| grep apmservices-gateway
Kubernetes : kubectl get ingress -ndxi | grep apmservices-gateway

If you are using DX OI SaaS:
https://apmgw.dxi-na1.saas.broadcom.com

NASS Tenant Token

If you are using DX OI On Premise (20.x):
You can create a token using below 2 options:

Option 1) Login as MASTERADMIN, locate your tenant, then "Create a Tenant Token" 
Option 2) If you have installed DX APM, login to APM using an administrator account, then, go to Settings> Security > Generate New Token > select “Tenant” or "Agent" Token

If you are using DX OI SaaS:
login to APM using an administrator account, then, go to Settings> Security > Generate New Token > select "Agent (recommended) or “Tenant” Token

 

Here is an example of the oi_connector configuration page

If you are using DX OI On Premise (20.x):

 

If you are using DX OI SaaS:

3) Use the "Verify" options to validate the Jarvis and NASS endpoints only. 

DO NOT use it to verify the TenantID and ElasticSearch


4) Open the oi_connector raw configuration

a) Go to "Setup", make sure the configuration is correct as below:

ingest_data_to_nass = true 
ingest_uim_metrics_metadata_to_jarvis = false 
doc_type_version_uim_metrics_metadata= 2

b) If you have already created multiple Tenants in DX Platform "Cluster Management", then you need to find the right TAS tenant ID to use.

Go to "Resource | Properties", set nassTassToken = <your tas_tenant_id>

You can obtain the "tas_tenant_id" as below:

If you are using DX OI On Premise (20.x):

Option1: From ElasticSearch: http://es.<endpoint>/ao_dxi_tenants_1_1/_search?size=200&pretty

For example:

Option2: If you have installed APM, login to APM, you can obtain tas_tenant_id from the URL as below:

 

If you are using DX OI SaaS:

login to APM, you can obtain tas_tenant_id from the URL as below:

 

5) Review the oi_connector log

Default location : C:\Program Files (x86)\Nimsoft\probes\gateway\oi_connector\oi_connector.log

a) Make sure there are no errors or exceptions, you can find below some example :


- Incorrect nginx endpoint (jarvis ingestion endpoint):

Feb 02 22:58:49:203 [INIT_THREAD, oi_connector] Exception in validateResponseFromJarvis : java.net.UnknownHostException: 
..
Feb 02 22:58:49:713 [GROUP_PROCESSOR_THREAD-1, oi_connector] Cannot post UIM Group data to Jarvis, either payload is blank or DOI Configurations are not valid.
..
Feb 02 23:18:17:971 [QUEUE_MONITOR_THREAD, oi_connector] Jarvis is unavailable. Alarm, Inventory and Group Queues status is not monitored

 

- Incorrect NASS endpoint (apmservices-gateway) or token: 

Feb 02 22:39:03:619 [DEVICE_COUNT_METRICS_PROCESSOR-1, oi_connector] Exception while executing doPost request to Jarvis url:  java.net.UnknownHostException: 
 at java.net.InetAddress.getAllByName0(InetAddress.java:1281)
..

Feb 02 22:39:03:619 [DEVICE_COUNT_METRICS_PROCESSOR-1, oi_connector] Response is null for posting Metrics Metadata to NASS, please verify nass url and token.
..
Feb 02 22:40:00:107 [DOI_MONITOR_THREAD, oi_connector] Nass is Down. 0

Recommendation: Generate a New Token.

b) Search for common words to help you confirm that the OI Connector is working correctly:

Nass is Available”, "Jarvis is Available", "posted on jarvis", "posted on NASS"

Here are some examples:

Feb 02 22:28:51:926 [INIT_THREAD, oi_connector] Jarvis is Available. 202
..
Feb 02 22:28:52:552 [INIT_THREAD, oi_connector] Nass is Available. 200
..
Feb 02 22:28:56:253 [GROUP_PROCESSOR_THREAD-1, oi_connector] Total no of UIM Groups posted on jarvis via post : 6
Feb 02 23:20:56:161 [ALARM_PROCESSOR_THREAD-1, oi_connector] Total no of UIM Alarms posted on jarvis via post : 7
Feb 02 23:22:34:432 [QOS_PROCESSOR_THREAD-4, oi_connector] Total no of Qos values posted on NASS via post : 3

c) Verify axagateway Queues:

Option1: Using Dr NimBus ("C:\Program Files (x86)\Nimsoft\bin\drnimbus.exe")

Check “In Queue” column, if value is constantly > 0 it indicates a  problem posting the data to OI
Check “Count” column, if value is > 0 it indicates that data has been posted correctly to OI

Option 2: Using Hub Status, Open the hub probe, click "Status" tab

Check “Queued” column, if value is constantly > 0 it indicates a  problem posting the data
Check “Sent” column, if value is > 0 it indicates that data has been posted correctly to OI

 

c) Enable TRACE logging to better troubleshoot the issue

Open OI Connector > Raw Configure > Setup, change logsize = 999999

6) For more information regarding the UIM probe configuration and troubleshooting, refer to the UIM documentation

https://techdocs.broadcom.com/us/en/ca-enterprise-software/it-operations-management/ca-unified-infrastructure-management-probes/GA/alphabetical-probe-articles/oi-connector-ca-digital-operational-intelligence-gateway/oi-connector-ac-configuration.html

 

Probe #2: apm_bridge 

1) Make sure you are using a supported version of the apm_bridge and uimapi probes, check the documentation 

https://techdocs.broadcom.com/us/en/ca-enterprise-software/it-operations-management/digital-operational-intelligence/20-2/release-notes/Compatibility-Matrix.html

 

2) Ensure UIMAPI has been deployed and is available 

http(s)://<UMP_url>:<UMP_port>/uimapi/docs/index.html

For example:

3) Ensure the apm_bridge configuration is correct:

host

If you are using DX OI On Premise (20.x):
the apmservices-gateway servername can be obtain using below commands:

oc get routes -ndxi| grep apmservices-gateway
kubectl get ingress -ndxi | grep apmservices-gateway

If you are using DX OI SaaS:
apmgw.dxi-na1.saas.broadcom.com

Port

443 (port of apmservices-gateway)

usessl

true (protocol of apmservices-gateway)

NASS Tenant Token

If you are using DX OI On Premise (20.x):
You can create a token using below 2 options:

Option 1) Login as MASTERADMIN, locate your tenant, then "Create a Tenant Token" 
Option 2) If you have installed DX APM, login to APM using an administrator account, then, go to Settings> Security > Generate New Token > select “Tenant” or "Agent" Token

If you are using DX OI SaaS:
login to APM using an administrator account, then, go to Settings> Security > Generate New Token > select “Agent (recommended) or ” Token

origin

 UIM hub

NOTE: If you are using Operator Console:

a) Update a Host related entries for OperatorConsole & AdminConsole in the Profile configuration.
b) There is a dependency on uimapi 2.32 HF3 for the latest apm_bridge v1.05 probe, update UIMAPI to version 2.32 HF3 on OperatorConsole/UMP node 
c) There is is known issue affecting apm_bridge, use apm_bridge_1.0.5-T1, see

apm_bridge probe ERROR Could not determine UIM or TAS devices - com.fasterxml.jackson.databind.JsonMappingException
https://knowledge.broadcom.com/external/article/211750

 

Here is an example of the apm_bridge configuration page:

If you are using DX OI On Premise (20.x):

If you are using DX OI SaaS:


NOTE
:
-If you need to update the port, hostnames and usessl properties, you can use the Probe raw configuration.
-If you need to update the Token, you need delete the profile and recreate it again.

 

 

4) Make sure to configure the uimapi endpoint:

user

UMP user admin, in this example: administrator

pass

UMP user admin password, in this example: [email protected]

usessl

false, because in this example UMP server is running using http

port

in this example, UMP webapp is running on port 90

host

UMP server hostname, in this example : ibntest005169.bpc.broadcom.net

IMPORTANT: If you need to send data from the secondary hub through apm_bridge, specify the secondary hub hostname comma separated after the primary hub in the Origin field.


Here is an example of the uimapi probe configuration page:


5) Review the apm_bridge.log log:

Default location :C:\Program Files (x86)\Nimsoft\probes\service\apm_bridge\apm_bridge.log


a) Make sure there are no errors or exceptions, you can find below some example :

Feb 01 01:14:55:772 ERROR [ATS Client - 11019, apm_bridge] Could not POST to TAS NG. profile id: 0, host: 
org.apache.http.client.HttpResponseException: 

Recommendation: reinstall apm_bridge


b) Search for common string to help you confirm that the configuration is correct and data is getting ingested into TAS (apmservices-gateway):

"Inventory sent to APM for profile"

Here are some examples:

Feb 01 09:21:40:521 INFO  [ForkJoinPool-57-worker-1, apm_bridge] Inventory sent to APM for profile 0 : CS vertices: 2 / CI vertices: 0 and edges: 0
Feb 01 09:21:40:521 INFO  [ForkJoinPool-57-worker-1, apm_bridge] Inventory sent to APM for profile 0 : CI vertices: 2 and edges: 0
Feb 01 09:21:40:525 INFO  [ForkJoinPool-57-worker-1, apm_bridge] Inventory sent to APM for profile 0 : APM vertices: 2 and edges: 0
Feb 01 09:21:40:525 INFO  [ForkJoinPool-57-worker-1, apm_bridge] Ending APM Inventory Update for profile 0 / took 2003 ms to process
Feb 01 09:21:40:525 INFO  [APMInventoryService RUNNING, apm_bridge] Done updating APM Inventory


6) Enable additional logging:

Edit the apm_bridge.cfg configuration file with a text editor, or use Raw Configure to update key/value pairs. You can update or add the following key/ value pairs to enable additional logs and for troubleshooting:


<topology_service>      
save_incoming_graphs = 1    
log_outgoing_request = 1      
publish_config_items = 0  
</topology_service>  

After restart of the apm_bridge proble, a new "debug" directory will be created that includes subfolder, each will contain json files of the inventory and topology data.


7) For more information regarding the UIM probe configuration and troubleshooting, refer to the UIM documentation

https://techdocs.broadcom.com/us/en/ca-enterprise-software/it-operations-management/ca-unified-infrastructure-management-probes/GA/alphabetical-probe-articles/apm-bridge-ca-apm-bridge/apm-bridge-configuration.html

 

STEP#2 : Check the Alarms, Metrics, Inventory and Topology data using DX OI UI

a) Alarms (ElasticSearch)

Go to DX OI > Alarms

 

b) Metrics (NASS)

Go to DX OI > Performance
Click Done
Click Metrics
Select some metrics to display



c) Inventory and Topology (TAS)

Go to DX OI > Services > Create a new Service

From Add Elements,, select Infrastructure > UIM Group or Hostname, you should be able to see your UIM servers and groups, below an example:

 

STEP#3 : Check the Alarms, Metrics, Inventory and Topology data using Elastic and TAS/NAS REST APIs

** This section is valid for DX On Premise 20.x version only, if you are using DX OI SaaS, contact Broadcom Support for assistance **


a) Alarms(ElasticSearch)

Verify that data (metrics, alerts, inventory) has been ingested in the  respective Product indices:

1) List all the UIM product indices:

http://es.<servername>/_cat/indices/*uim*?v

For example:

http://es.munqa001493.bpc.broadcom.net/_cat/indices/*uim*?v

Check that doc.count and size columns values increases over the time.


2) Check the content of a specific index:

http://es.<severname>/<index-name>/_search?pretty&sort=timestamp:desc&size=500

For example:

http://es.munqa001493.bpc.broadcom.net/*alarms_all*/_search?pretty&[email protected]:desc&size=200


NOTE
: You can use https://www.epochconverter.com/ to convert epoch to human-readable date and vice versa, 



For more ElasticSearch queries and examples, see:

DX AIOps - ElasticSearch Queries
https://knowledge.broadcom.com/external/article/207215

 

b) Inventory and Topology (TAS)

Query for topology data using REST APIs:

Open Postman (you can download postman from https://www.postman.com/downloads/)

POST API End Point to check TAS data for UIM inventory: 

http://<APMServices Gateway Host>/tas/graph/query

For example:

http://apmservices-gateway.munqa001493.bpc.broadcom.net/tas/graph/query

Headers:

Content-Type: application/json

Authorization: Bearer <Tenant Token>

Body:

{
   "filter": {
       "op": "JOIN",
       "input": {
           "op": "AND",
           "input": [
               {
                   "op": "ATTRIBUTE",
                   "expressions": [
                       {
                           "name": "Product",
                           "values": [
                               "UIM"
                           ]
                       }
                   ]
               }
           ]
       }
   },
   "universe": null,
   "version": null,
   "time": 0,
   "stitchingEnabled": true,
   "includeStatus": true
}

Expected Result: you  should see all new vertices added to TAS

c) Metrics(NASS)

Query for metrics data using REST APIs:

Open Postman (you can download postman from https://www.postman.com/downloads/)

POST API End Point to check NASS Metric Metadata matching a pattern

http://<APM Service Gateway Host>/metadata/queryMetric

For example:

http://apmservices-gateway.munqa001493.bpc.broadcom.net/metadata/queryMetric

Headers:

Content-Type: application/json

Authorization: Bearer <Tenant Token>

Body:

{
   "size": 20000,
 "specifier": {
   "op": "SPEC",
   "sourceNameSpecifier": {
     "op": "REGEX",
     "pattern": "(.*)UIM(.*)"
   },
   "attributeNameSpecifier": {
     "op": "ALL"
   }
 }
}

Expected Result: you should see all new vertices added to NAS

 

STEP#4 : Verify Jarvis, Elastic, Zookeeper and Kafka:

** This section is valid for DX On Premise 20.x version only, if you are using DX OI SaaS, contact Broadcom Support for assistance **

DX OI - Jarvis / Kafka Troubleshooting, Common Issues and Best Practices
https://knowledge.broadcom.com/external/article/189119



B) WHAT FILES SHOULD I COLLECT FOR BROADCOM SUPPORT?

If you still need assistance, contact Broadcom Support (https://support.broadcom.com/) and provide the below information:

a) Screenshots illustrating the discrepancy between UIM and DX OI, If the problem is related to difference is metrics and values

b) oi_connector and apm_bridge probe logs

If possible enable TRACE logging level from the probes, restart, reproduce the issue and gather below logs:

C:\Program Files (x86)\Nimsoft\probes\gateway\oi_connector/logs/*
C:\Program Files (x86)\Nimsoft\probes\gateway\oi_connector/ oi_connector.cfg

C:\Program Files (x86)\Nimsoft\probes\service\apm_bridge\apm_bridge.log
C:\Program Files (x86)\Nimsoft\probes\service\apm_bridge\apm_bridge.cfg



If you are using DX OI On Premise (20.x)
: collect the below additional information:

c) cluster and pods status:

kubectl get pods -n<dxi-namespace>
kubectl describe nodes -n<dxi-namespace>

d) From ElasticSearch

Collect the result of the below queries:

http(s)://{elasticsearch_endpoint}/_cat/indices/*uim*?v
http(s)://{elasticsearch_endpoint}/_cat/indices/?v&s=ss:desc&h=health,store.size,pri.store.size,pri,rep,store.size,pri.store.size,docs.count,docs.deleted,index,cds
http(s)://{elasticsearch_endpoint}/_cluster/health?pretty&human

From all Elastic server and NFS server, collect the result of : df -h

e) If the problem is related to Jarvis, Kafka or Elastic, collect the respective logs and evidences, see:

DX OI - Jarvis / Kafka Troubleshooting, Common Issues and Best Practices
https://knowledge.broadcom.com/external/article/189119

 

Additional Information

DX OI NetOps integration - SaaS

https://techdocs.broadcom.com/us/en/ca-enterprise-software/it-operations-management/dx-operational-intelligence-saas/SaaS/integration/integrate-ca-products/add-ca-uim.html

DX OI NetOps integration - On premise

https://techdocs.broadcom.com/us/en/ca-enterprise-software/it-operations-management/digital-operational-intelligence/20-2/integration/integrate-ca-products/add-ca-uim.html

DX AIOPs (OI and APM) - Troubleshooting, Common Issues and Best Practices
https://knowledge.broadcom.com/external/article/190815/dx-oi-troubleshooting-common-issues-and.html

DX OI 1.3.2 integration with UIM - Troubleshooting guidelines
https://knowledge.broadcom.com/external/article/144181

Attachments