The following is a list of techniques and suggestions to employ when troubleshooting OI to UIM integration issues
A) Checklist
B) What files should I collect for Broadcom Support?
DX Operational Intelligence 2x, SaaS
To successfully enable the UIM integration with OI, you need to properly configure the oi_connector and apm_bridge probes
1) Make sure you are using a supported version of the oi_connector probe:
2) Verify oi_connector probe configuration is correct
3) There are special use-cases where you need to update the raw configuration, below some examples:
a) Go to "Setup", make sure the below properties are set:
ingest_data_to_nass = true
ingest_uim_metrics_metadata_to_jarvis = false
b) If you have upgraded the oi_connector probe to 1.38 onwards, make sure to set "subscribe_to_uim_inventory_
c) If you have multiple Tenants:
Go to "Resource | Properties", set nassTassToken = <your tas_tenant_id>
How to obtain the "tas_tenant_id"?
Option 1: Login to APM, you can obtain tas_tenant_id from the URL as below:
Option2: If the setup doesn't have APM, you can obtain the tas_tenant_id from Cluster Manager, go to Tenant Services, move your mouse over the tenant name, tas_tenant_id is the "Internal id". If you are using DX OI SaaS, contact Broadcom Support for assistance.
4) Review the oi_connector log
Default location : C:\Program Files (x86)\Nimsoft\probes\gateway\oi_connector\oi_connector.log
a) Make sure there are no errors or exceptions, you can find below some example :
- Incorrect nginx endpoint (jarvis ingestion endpoint):
Feb 02 22:58:49:203 [INIT_THREAD, oi_connector] Exception in validateResponseFromJarvis : java.net.UnknownHostException:
..
Feb 02 22:58:49:713 [GROUP_PROCESSOR_THREAD-1, oi_connector] Cannot post UIM Group data to Jarvis, either payload is blank or DOI Configurations are not valid.
..
Feb 02 23:18:17:971 [QUEUE_MONITOR_THREAD, oi_connector] Jarvis is unavailable. Alarm, Inventory and Group Queues status is not monitored
- Incorrect NASS endpoint (apmservices-gateway) or token:
Feb 02 22:39:03:619 [DEVICE_COUNT_METRICS_PROCESSOR-1, oi_connector] Exception while executing doPost request to Jarvis url: java.net.UnknownHostException:
at java.net.InetAddress.getAllByName0(InetAddress.java:1281)
..
Feb 02 22:39:03:619 [DEVICE_COUNT_METRICS_PROCESSOR-1, oi_connector] Response is null for posting Metrics Metadata to NASS, please verify nass url and token.
..
Feb 02 22:40:00:107 [DOI_MONITOR_THREAD, oi_connector] Nass is Down. 0
Recommendation: Generate a New Token.
b) Search for common words to help you confirm that the OI Connector is working correctly:
“Nass is Available”, "Jarvis is Available", "posted on jarvis", "posted on NASS"
Here are some examples:
Feb 02 22:28:51:926 [INIT_THREAD, oi_connector] Jarvis is Available. 202
..
Feb 02 22:28:52:552 [INIT_THREAD, oi_connector] Nass is Available. 200
..
Feb 02 22:28:56:253 [GROUP_PROCESSOR_THREAD-1, oi_connector] Total no of UIM Groups posted on jarvis via post : 6
Feb 02 23:20:56:161 [ALARM_PROCESSOR_THREAD-1, oi_connector] Total no of UIM Alarms posted on jarvis via post : 7
Feb 02 23:22:34:432 [QOS_PROCESSOR_THREAD-4, oi_connector] Total no of Qos values posted on NASS via post : 3
c) Verify axagateway queues:
Option1: Using Dr NimBus ("C:\Program Files (x86)\Nimsoft\bin\drnimbus.exe")
Check “In Queue” column, if value is constantly > 0 it indicates a problem posting the data to OI
Check “Count” column, if value is > 0 it indicates that data has been posted correctly to OI
Option 2: Using Hub Status, Open the hub probe, click "Status" tab
Check “Queued” column, if value is constantly > 0 it indicates a problem posting the data
Check “Sent” column, if value is > 0 it indicates that data has been posted correctly to OI
d) Enable 5-TRACE logging for advance troubleshooting ,open the "Raw Configure > Setup" and change the logsize temporally to 999999.
IMPORTANT: Once you complete the troubleshooting reset the log level back to 3-Info and logsize to 10240
You can also, open the "Raw Configure > Setup" and change the logsize temporally to 999999, once you complete the troubleshooting reset it to the original value.
1) Make sure you are using a supported version of the apm_bridge and webservicesrest probes:
NOTE: You can check webservicesrest version from IM as below:
2) Ensure UIMAPI has been deployed and is available
If UMP: http(s)://<UMP_url>:<UMP_port>/uimapi/docs/index.html
If OC: http(s)://<OC>:<OC_port>/uimapi
For example:
OC:
UMP:
3) Verify apm_bridge and uimapi probe configuration is correct
4) Review the apm_bridge.log log:
Default location :C:\Program Files (x86)\Nimsoft\probes\service\apm_bridge\apm_bridge.log
a) Make sure there are no errors or exceptions, you can find below some example :
Feb 01 01:14:55:772 ERROR [ATS Client - 11019, apm_bridge] Could not POST to TAS NG. profile id: 0, host:
org.apache.http.client.HttpResponseException:
Recommendation: reinstall apm_bridge
b) Search for common string to help you confirm that the configuration is correct and data is getting ingested into TAS (apmservices-gateway):
"Inventory sent to APM for profile"
Here are some examples:
Feb 01 09:21:40:521 INFO [ForkJoinPool-57-worker-1, apm_bridge] Inventory sent to APM for profile 0 : CS vertices: 2 / CI vertices: 0 and edges: 0
Feb 01 09:21:40:521 INFO [ForkJoinPool-57-worker-1, apm_bridge] Inventory sent to APM for profile 0 : CI vertices: 2 and edges: 0
Feb 01 09:21:40:525 INFO [ForkJoinPool-57-worker-1, apm_bridge] Inventory sent to APM for profile 0 : APM vertices: 2 and edges: 0
Feb 01 09:21:40:525 INFO [ForkJoinPool-57-worker-1, apm_bridge] Ending APM Inventory Update for profile 0 / took 2003 ms to process
Feb 01 09:21:40:525 INFO [APMInventoryService RUNNING, apm_bridge] Done updating APM Inventory
5) Enable additional logging:
Edit the apm_bridge.cfg configuration file with a text editor, or use Raw Configure to update key/value pairs. You can update or add the following key/ value pairs to enable additional logs and for troubleshooting:
<topology_service>
save_incoming_graphs = 1
log_outgoing_request = 1
publish_config_items = 0
</topology_service>
After restart of the apm_bridge proble, a new "debug" directory will be created that includes subfolder, each will contain json files of the inventory and topology data.
Go to DX OI > Alarms
Go to DX OI > Performance
Click Done
Click Metrics
Select some metrics to display
Go to DX OI > Services > Create a new Service
From Add Elements,, select Infrastructure > UIM Group or Hostname, you should be able to see your UIM servers and groups, below an example:
Onpremise:
SaaS:
** Below options ares valid for On Premise ONLY, if you are using DX OI SaaS, contact Broadcom Support for assistance **
Verify that data (metrics, alerts, inventory) has been ingested in the respective Product indices:
1) List all the UIM product indices:
http://es.<servername>/_cat/indices/*uim*?v
For example:
http://es.munqa001493.bpc.broadcom.net/_cat/indices/*uim*?v
Check that doc.count and size columns values increases over the time.
2) Check the content of a specific index:
http://es.<severname>/<index-name>/_search?pretty&sort=timestamp:desc&size=500
For example:
http://es.munqa001493.bpc.broadcom.net/*alarms_all*/_search?pretty&sort=@timestamp:desc&size=200
NOTE: You can use https://www.epochconverter.com/ to convert epoch to human-readable date and vice versa,
For more ElasticSearch queries and examples, see:
DX AIOps - ElasticSearch Queries
https://knowledge.broadcom.com/external/article/207215
Open Postman (you can download postman from https://www.postman.com/downloads/)
POST API End Point to check TAS data for UIM inventory:
http://<APMServices Gateway Host>/tas/graph/query
For example:
http://apmservices-gateway.munqa001493.bpc.broadcom.net/tas/graph/query
Headers:
Content-Type: application/json
Authorization: Bearer <Tenant Token>
Body:
{
"filter": {
"op": "JOIN",
"input": {
"op": "AND",
"input": [
{
"op": "ATTRIBUTE",
"expressions": [
{
"name": "Product",
"values": [
"UIM"
]
}
]
}
]
}
},
"universe": null,
"version": null,
"time": 0,
"stitchingEnabled": true,
"includeStatus": true
}
Expected Result: you should see all new vertices added to TAS
Open Postman (you can download postman from https://www.postman.com/downloads/)
POST API End Point to check NASS Metric Metadata matching a pattern
http://<APM Service Gateway Host>/metadata/queryMetric
For example:
http://apmservices-gateway.munqa001493.bpc.broadcom.net/metadata/queryMetric
Headers:
Content-Type: application/json
Authorization: Bearer <Tenant Token>
Body:
{
"size": 20000,
"specifier": {
"op": "SPEC",
"sourceNameSpecifier": {
"op": "REGEX",
"pattern": "(.*)UIM(.*)"
},
"attributeNameSpecifier": {
"op": "ALL"
}
}
}
Expected Result: you should see all new vertices added to NAS
** This section is valid for DX On Premise 2x version only, if you are using DX OI SaaS, contact Broadcom Support for assistance **
AIOps - Jarvis (kafka, zookeeper, elasticSearch) Troubleshooting
If you still need assistance, contact Broadcom Support (https://support.broadcom.com/) and provide the below information:
a) Screenshots illustrating the discrepancy between UIM and DX OI, If the problem is related to difference is metrics and values
b) oi_connector and apm_bridge probe logs
If possible enable TRACE logging level from the probes, restart, reproduce the issue and gather below logs:
C:\Program Files (x86)\Nimsoft\probes\gateway\oi_connector/logs/*
C:\Program Files (x86)\Nimsoft\probes\gateway\oi_connector/ oi_connector.cfg
C:\Program Files (x86)\Nimsoft\probes\service\apm_bridge\apm_bridge.log
C:\Program Files (x86)\Nimsoft\probes\service\apm_bridge\apm_bridge.cfg
If you are using DX OI On Premise (20.x): collect the below additional information:
c) cluster and pods status:
kubectl get pods -n<dxi-namespace>
kubectl describe nodes -n<dxi-namespace>
d) From ElasticSearch
Collect the result of the below queries:
http(s)://{elasticsearch_endpoint}/_cat/indices/*uim*?v
http(s)://{elasticsearch_endpoint}/_cat/indices/?v&s=ss:desc&h=health,store.size,pri.store.size,pri,rep,store.size,pri.store.size,docs.count,docs.deleted,index,cds
http(s)://{elasticsearch_endpoint}/_cluster/health?pretty&human
From all Elastic server and NFS server, collect the result of : df -h
e) If the problem is related to Jarvis, Kafka or Elastic, collect the respective logs and evidences, see:
DX OI - Jarvis / Kafka Troubleshooting, Common Issues and Best Practices
https://knowledge.broadcom.com/external/article/189119