DX OI integration with NetOps - OI Connector Troubleshooting, Common Issues and Best Practices

book

Article ID: 210469

calendar_today

Updated On:

Products

DX Operational Intelligence

Issue/Introduction

The following is a list of techniques and suggestions to employ when troubleshooting OI connector issues

A) Checklist

B) What files should I collect for Broadcom Support?

Environment

DX Operational Intelligence 20.x

OIConnector 1.5,2.1.x

DX NetOps

Resolution

Architecture:

- Jarvis (nginx route) store Events and groups
- TAS (apmservices-gateway route) stores the inventory and topology data
- NASS (apmservices-gateway route) stores the metrics

 

A) CHECKLIST

STEP #1 : Verify the OI connector configuration
STEP #2 : Check the Alarms, Metrics, Inventory and Topology data using DX OI UI
STEP #3 : Check the Alarms, Metrics, Inventory and Topology data using Elastic and TAS/NAS REST APIs
STEP #4 : Verify Jarvis, Zookeeper and Kafka


STEP#1 Verify the OI connector configuration 


1)
Verify compatibility

https://support.broadcom.com/external/content/release-announcements/DX-Operational-Intelligence-Interoperability/18129

2) Make sure the values entered during installation are correct. If the installation is already completed, review the <OIConnector-HOME>/conf/config.xml

a) Here is a summary of the values entered during the OI Connector installation:

CA Performance Center

Protocol (Default: http): http or https
Hostname (Default: localhost): <performance center hostname>
Port (Default: 8181): <capm port)
Tenant (Default: _default_): <capm tenant>
User Name (Default: admin): <capm admin user>

APM Gateway

Protocol (Default: http): http or https
Hostname : <apmservices-gateway endpoint>
Port (Default: 8004): 80 or 443

Tenant ID : <Tenant ID (onprem) or Cohort ID (SaaS) >

Security Token : <Agent Token or Tenant Token>

OI Jarvis Server (only if you are using OI connector 1.x)

Protocol (Default: http): http or https
Hostname : <doi-nginx endpoint>
Port : 80 or 443

b) Here is a summary list of the steps or commands to use to obtain the above values:

APM Gateway  Hostname

If you are using DX OI On Premise (20.x):
the apmservices-gateway servername can be obtain using below commands:

Openshift    : oc get routes -ndxi| grep apmservices-gateway
Kubernetes : kubectl get ingress -ndxi | grep apmservices-gateway

If you are using DX OI SaaS:
Login to SaaS > Settings> Connector Parameters > TAS Endpoint

APM Gateway  Security Token

If you are using DX OI On Premise (20.x):
You can create a token using below 2 options:

Option 1) Login as MASTERADMIN, locate your tenant, then "Create a Tenant Token" 
Option 2) If you have installed DX APM, login to APM using an administrator account, then, go to Settings> Security > Generate New Token > select “Tenant” or "Agent" Token

If you are using DX OI SaaS:
Login to SaaS > Settings> Connector Parameters > Click Generate New Token

(only if you are using OI connector 1.x) Jarvis Server  Hostname

If you are using DX OI On Premise (20.x):
the nginx servername can be obtain using below commands:

Openshift    : oc get routes -ndxi| grep nginx
Kubernetes : kubectl get ingress -ndxi | grep nginx

If you are using DX OI SaaS:
Login to SaaS > Settings> Connector Parameters > Jarvis Endpoint

Tenant ID

If you are using DX OI On Premise (20.x):
the "tenant_id" can be obtain from Elastic using below query:
http://<elastic-endpoint>/ao_dxi_tenants_1_1/_search?size=200&pretty

NOTE: how to obtain the "elastic-endpoint" ?

If Kubernetes:  kubectl get ingress -n<dxi-namespace> | grep jarvis

for example: kubectl get ingress -ndxi | grep jarvis
jarvis-es                             <none>   es.10.109.32.88.nip.io                           10.109.32.88   80      19d

If Openshift:     oc get routes -n<dxi-namespace> | grep jarvis

for example: oc get routes -ndxi | grep jarvis
jarvis-es-7krrv                             es.munqa001493.bpc.broadcom.net                          /                  jarvis-elasticsearch-lb       9200                                          None

If you are using DX OI SaaS:
Login to SaaS > Settings> Connector Parameters > Cohort ID

 

3) Check that the OI Connector services are up and running

service caperfcenter_oiconnector status
service caperfcenter_oiagent status

         If you are using oi_connector version 2.x, check that kafka service is up and running:

service kafka status

4) Check that the OI Connector services in NetOps console

Go to Performance Center >  Administration > System Status page
Locate the "OI Connector" section
Verify Status = Normal

 

Known issue:  If you are using DX OI 20.2.x On premise with OI connector 2.1.x, the OI Connector will report the status as failed. You can ignore the status.

 

5) Review the OIConnector logs : <OIConnector-HOME>/logs

a) There are 2 logs:

- OIConnector.log : main log file
- OIAgent*.log : NFA, ADA data collection activity


b) How to enable DEBUG logging:

OIConnector logging: <OIConnector-HOME>/conf/log4j.xml

Open ./conf/log4j.xml, change logging level from INFO to DEBUG as below:

...

<!-- ***** Root Logger definition ***** -->

    <root>

        <level value="DEBUG"/>

        <appender-ref ref="console"/>

        <appender-ref ref="complete" />

    </root>


OI Agent service logging: <OIConnector-HOME>/conf/agent-wrapper.conf

Uncomment the below line:

#wrapper.app.parameter.2=-Ssupport

 

You need to restart the oi OI Agent service:

service caperfcenter_oiagent restart

 

c) Example of common errors or exceptions :


USE-CASE#1 :Problem with Data Aggregator impacting metrics gathering

ERROR [2020-09-12 22:58:41,943] [pool-2-thread-6] [OpenAPIInventoryQueryInfHelper] - [EVENT UNSPECIFIED Anonymous:[email protected] -> /com.ca.im.oinet.connector.sources.da.inventory.OpenAPIInventoryQueryInfHelper] OpenAPI Inventory Query Failed with Status: 500
..

INFO  [2020-09-11 20:34:33,773] [pool-7-thread-5] [StatusHolder] - [EVENT UNSPECIFIED Anonymous:[email protected] -> /com.ca.im.oinet.connector.status.StatusHolder] Updating OI_CONNECTOR_NAME status information, STATUS [UP -> UP] HEALTH [NORMAL -> FAILED] Unable to contact OpenAPI for data

Or Performance Center >  Administration > System Status page:  "Unable to connect OpenAPI for data"

Recommendation:

- Restart Data Aggregator (DA), see:

Unable to pull custom metric from ODATA
https://knowledge.broadcom.com/external/article?articleId=190659


USE-CASE #2 : Problem with Jarvis(nginx) and/or apmservices-gateway endpoints

INFO  [2020-09-11 20:00:32,988] [pool-2-thread-10] [StatusHolder] - [EVENT UNSPECIFIED Anonymous:[email protected] -> /com.ca.im.oinet.connector.status.StatusHolder] Updating OI_CONNECTOR_NAME status information, STATUS [UP -> UP] HEALTH [NORMAL -> FAILED] Unable to contact OI Platform to send data to
ERROR [2020-09-11 20:00:32,988] [pool-2-thread-10] [RemoteDataConnectionImpl] - [EVENT UNSPECIFIED Anonymous:[email protected] -> /com.ca.im.oinet.connector.sources.RemoteDataConnectionImpl]Error posting documents to Jarvis index(itoa_groups_capm): 503_
ERROR [2020-09-11 20:00:32,989] [pool-2-thread-10] [GroupTaskImpl] - [EVENT UNSPECIFIED Anonymous:[email protected] -> /com.ca.im.oinet.connector.task.group.GroupTaskImpl] Unable to push 27 groups to data sink.
ERROR [2020-09-11 20:00:33,607] [pool-2-thread-3] [TASGroupTask] - [EVENT UNSPECIFIED Anonymous:[email protected] -> /com.ca.im.oinet.connector.task.group.TASGroupTask] Failed ingesting groups to TAS for CAPC tenant id : Coke_test Error: 503

Recommendation:

Verify that the nginx and apm-gateway endpoints are recheable and available, see point #3

USE-CASE #3 : CAPM user password expired, changed or not valid.

ERROR [2020-11-27 12:42:12,213] [WrapperSimpleAppMain] [OIIntegration] - [EVENT UNSPECIFIED Anonymous:[email protected] -> /com.ca.im.oinet.connector.OIIntegration] No response from webservice - unable to configure data sources
WARN  [2020-11-27 12:42:12,230] [WrapperSimpleAppMain] [OIIntegration] - [EVENT UNSPECIFIED Anonymous:[email protected] -> /com.ca.im.oinet.connector.OIIntegration] Unable to determine CA Performance Center version

Recommendation

Update the <OIConnector-HOME>/conf/config.xml with the new password, KB:

DX OI - OIConnector not connecting when CAPC user and password changes
https://knowledge.broadcom.com/external/article/204144/dx-oi-oiconnector-not-connecting-when-c.html


b) Search for common words to help you confirm that th OI Connector is working correctly:

"CLIENT_SUMMARY_NASS" , “Successfully", "Started", "JARVIS_INGEST_RECORD_COUNT"

Here are some examples when using OI connector 2.1.x

Here are some examples when using OI connector 1.x

INFO  [2020-09-13 09:24:50,587] [pool-2-thread-4] [InventoryTaskImpl] - [EVENT UNSPECIFIED Anonymous:[email protected] -> /com.ca.im.oinet.connector.task.inventory.InventoryTaskImpl] Successfully ingested inventory to TAS for CAPC tenant id : Coke_test
..

INFO  [2020-09-13 09:18:14,105] [pool-2-thread-2] [RemoteDataConnectionImpl] - [EVENT UNSPECIFIED Anonymous:[email protected] -> /com.ca.im.oinet.connector.sources.RemoteDataConnectionImpl] JARVIS_INGEST_RECORD_COUNT : 27
INFO  [2020-09-13 09:18:15,190] [pool-2-thread-7] [TASGroupTask] - [EVENT UNSPECIFIED Anonymous:[email protected] -> /com.ca.im.oinet.connector.task.group.TASGroupTask] Successfully ingested groups to TAS for CAPC tenant id : Coke_test

..

INFO  [2020-09-13 09:11:31,481] [pool-3-thread-1] [PersistentRegistrationCache] - Successfully loaded760 metric registrations from /opt/CA/OIConnector/conf/MetricRegistrationCache-F1B889C8-4BB8-4860-BB22-447D0EEA56B0.ser
INFO  [2020-09-13 09:11:31,492] [pool-3-thread-1] [NASSClient] - Started NASS Client.

..

INFO  [2020-09-13 09:20:35,748] [WrapperSimpleAppMain] [ServerConnector] - Started ServerConnector@76f3fc3b{HTTP/1.1,[http/1.1]}{0.0.0.0:8782}

 

STEP#2 : Check Anomaly Alarms, Metrics, Inventory and Topology data using DX OI UI

a) Metrics (NASS)

Go to Performance:

In SaaS

In 20.2.x on premise:




b) Inventory and Topology (TAS)

Go to DX OI > Services > Create a new Service

From Add Elements,, select Network > Device Names, you should be able to see your NetOps devices, below an example:

 

STEP#3 : Check the Alarms, Metrics, Inventory and Topology data using Elastic and TAS/NAS REST APIs

** This section is valid for DX On Premise 20.x version only, if you are using DX OI SaaS, contact Broadcom Support for assistance **


a) Alarms(ElasticSearch)

Verify that data (metrics, alerts, inventory) has been ingested in the  respective Product indices:

1) List all the UIM product indices:

http://es.<servername>/_cat/indices/*capm*?v

For example:

http://es.munqa001493.bpc.broadcom.net/_cat/indices/*capm*?v

Check that doc.count and size columns values increases over the time.


2) Check the content of a specific index:

http://es.<severname>/<index-name>/_search?pretty&[email protected]:desc&size=500

For example:

http://es.munqa001493.bpc.broadcom.net/ao_itoa_groups_capm_1_1/_search?pretty&[email protected]:desc&size=500


NOTE
: You can use https://www.epochconverter.com/ to convert @timestamp values (epoch time) to human-readable date and vice versa, 


For more ElasticSearch queries and examples, see:

DX AIOps - ElasticSearch Queries
https://knowledge.broadcom.com/external/article/207215

 

b) Inventory and Topology (TAS)

Query for topology data using REST APIs:

Open Postman (you can download postman from https://www.postman.com/downloads/)

POST API End Point to check TAS data for UIM inventory: 

http://<APMServices Gateway Host>/tas/graph/query

For example:

http://apmservices-gateway.munqa001493.bpc.broadcom.net/tas/graph/query

Headers:

Content-Type: application/json

Authorization: Bearer <Tenant Token>

Body:

  {
   "filter": {
       "op": "JOIN",
       "input": {
           "op": "AND",
           "input": [
               {
                   "op": "ATTRIBUTE",
                   "expressions": [
                       {
                           "name": "Product",
                           "values": [
                               "CAPC"
                           ]
                       }
                   ]
               }
           ]
       }
   },
   "universe": null,
   "version": null,
   "time": 0,
   "stitchingEnabled": true,
   "includeStatus": true
}

Expected Result: you  should see all new vertices added to TAS

c) Metrics(NASS)

Query for metrics data using REST APIs:

Open Postman (you can download postman from https://www.postman.com/downloads/)

POST API End Point to check NASS Metric Metadata matching a pattern

http://<APM Service Gateway Host>/metadata/queryMetric

For example:

http://apmservices-gateway.munqa001493.bpc.broadcom.net/metadata/queryMetric

Headers:

Content-Type: application/json

Authorization: Bearer <Tenant Token>

Body:

{
   "size": 10000,
 "specifier": {
   "op": "SPEC",
   "sourceNameSpecifier": {
     "op": "REGEX",
     "pattern": "(.*)NetOps\\|CAPM(.*)|(.*)NetOps\\|ADA(.*)|(.*)NetOps\\|NFA(.*)"
   },
   "attributeNameSpecifier": {
     "op": "ALL"
   }
 }
}

Expected Result: you should see all new vertices added to NAS

STEP#4 : Verify Jarvis, Elastic, Zookeeper and Kafka:

** This section is valid for DX On Premise 20.x version only, if you are using DX OI SaaS, contact Broadcom Support for assistance **

DX OI - Jarvis / Kafka Troubleshooting, Common Issues and Best Practices
https://knowledge.broadcom.com/external/article/189119



B) WHAT FILES SHOULD I COLLECT FOR BROADCOM SUPPORT?

If you still need assistance, contact Broadcom Support (https://support.broadcom.com/) and provide the below information:

a) Screenshots illustrating the discrepancy between NetOps and DX OI if the problem is related to difference is metrics and values

b) oi_connector logs

If possible enable TRACE logging level from the probes, restart, reproduce the issue and gather below logs:

<OIConnector>/logs/*
<OIConnector>/conf/config.xml

c) from oi connector, result of:

service caperfcenter_oiconnector status
service caperfcenter_oiagent status


If you are using DX OI On Premise (20.x)
: collect the below additional information:

d) cluster and pods status:

kubectl get pods -n<dxi-namespace>
kubectl describe nodes -n<dxi-namespace>

e) from ElasticSearch

Collect the result of the below queries:

http(s)://{es_endpoint}/_cat/indices/*capm*?v
http(s)://{es_endpoint}/_cat/indices/?v&s=ss:desc&h=health,store.size,pri.store.size,pri,rep,store.size,pri.store.size,docs.count,docs.deleted,index,cds
http(s)://{es_endpoint}/_cluster/health?pretty&human

From all Elastic server and NFS server, collect the result of : df -h

f) If the problem is related to Jarvis, Kafka or Elastic, collect the respective logs and evidences, see:

DX OI - Jarvis / Kafka Troubleshooting, Common Issues and Best Practices
https://knowledge.broadcom.com/external/article/189119



Additional Information

DX OI NetOps integration - SaaS

https://techdocs.broadcom.com/us/en/ca-enterprise-software/it-operations-management/dx-operational-intelligence-saas/SaaS/integration/integrate-ca-products/add-ca-pm-ca-ada-and-ca-nfa.html

DX OI NetOps integration - On premise

https://techdocs.broadcom.com/us/en/ca-enterprise-software/it-operations-management/digital-operational-intelligence/20-2/integration/integrate-ca-products/add-ca-pm-ca-ada-and-ca-nfa.html

DX AIOPs (OI and APM) - Troubleshooting, Common Issues and Best Practices
https://knowledge.broadcom.com/external/article/190815/dx-oi-troubleshooting-common-issues-and.html

 

Attachments