ALERT: Some images may not load properly within the Knowledge Base Article. If you see a broken image, please right-click and select 'Open image in a new tab'. We apologize for this inconvenience.

DX OI integration with NetOps PM and NFA - OI Connector 2.x Troubleshooting

book

Article ID: 210469

calendar_today

Updated On:

Products

DX Operational Intelligence

Issue/Introduction

The following is a list of techniques and suggestions to employ when troubleshooting OI connector issues

Environment

DX NetOps OI Connector 2.1.x ONLY

NOTE: OIConnector 2.1.x is supported with DX NetOps Performance Management 20.2.4 or higher

Resolution

APM Gateway  Hostname

If OI 20.2.x:

- If Openshift    : oc get routes -ndxi| grep apmservices-gateway
- If Kubernetes : kubectl get ingress -ndxi | grep apmservices-gateway

- If OI SaaS or 21.x:
Go to Settings> Connector Parameters > TAS Endpoint

APM Gateway  Security Token

If OI 20.2.x:

Option 1) Login to the Tenant using and Admin account, go to DX Application Performance Management (APM)> Settings> Security > Generate New Token > select “Tenant” or "Agent" Token
Option 2) If APM is not installed, login as MASTERADMIN, go to tenants, locate your tenant, "Create a Tenant Token" 

If OI SaaS or 21.x
Go to Settings > Connector Parameters > Generate Ingestion Token

Tenant ID

If OI 20.2.x:

a) Obtain the "ElasticSearch endpoint"

- If Openshift:     oc get routes -n<dxi-namespace> | grep jarvis
- If Kubernetes:  kubectl get ingress -n<dxi-namespace> | grep jarvis

b) Open your browser, list all available tenants:
http://<elastic-endpoint>/ao_dxi_tenants_1_1/_search?size=200&pretty

If OI SaaS or 21.x
Go to Settings> Connector Parameters > Cohort ID

 

CHECK#2 : Check services

a) Check that the OI Connector services are up and running

service caperfcenter_oiconnector status
service caperfcenter_oiagent status
service kafka status

b) Check that the OI Connector services in NetOps console

If OI SaaS or 21.x:

Go to Performance Center >  Administration > System Status page
Locate the "OI Connector" section
Verify Status = Normal

If OI 20.2.x:

the OI Connector will report the status as failed. You can ignore the status.

 

CHECK#3 : Check OI Connector logs 

a) Review OIConnector logs : <OIConnector-HOME>/logs

- OIConnector.log : main log file
- OIAgent*.log : NFA, ADA data collection activity

b) Enable DEBUG logging:

a) OIConnector logging: <OIConnector-HOME>/conf/log4j.xml

Open ./conf/log4j.xml, change logging level from INFO to DEBUG as below:

...
<!-- ***** Root Logger definition ***** -->
    <root>
        <level value="DEBUG"/>
        <appender-ref ref="console"/>
        <appender-ref ref="complete" />
    </root>

b) OI Agent service logging: <OIConnector-HOME>/conf/agent-wrapper.conf

Uncomment the below line:

#wrapper.app.parameter.2=-Ssupport

 

You need to restart the oi OI Agent service:

service caperfcenter_oiagent restart

 

c) Example of common errors or exceptions :

USE-CASE #1 : Problem with apmservices-gateway endpoint

ERROR [2020-09-11 20:00:33,607] [pool-2-thread-3] [TASGroupTask] - [EVENT UNSPECIFIED Anonymous:[email protected] -> /com.ca.im.oinet.connector.task.group.TASGroupTask] Failed ingesting groups to TAS for CAPC tenant id : Coke_test Error: 503

Recommendation:

Verify that the apm-gateway endpoint is correct and available and token is correct

USE-CASE #2 : CAPM user password expired, changed or is not longer valid.

ERROR [2020-11-27 12:42:12,213] [WrapperSimpleAppMain] [OIIntegration] - [EVENT UNSPECIFIED Anonymous:[email protected] -> /com.ca.im.oinet.connector.OIIntegration] No response from webservice - unable to configure data sources
WARN  [2020-11-27 12:42:12,230] [WrapperSimpleAppMain] [OIIntegration] - [EVENT UNSPECIFIED Anonymous:[email protected] -> /com.ca.im.oinet.connector.OIIntegration] Unable to determine CA Performance Center version

Recommendation

Update the <OIConnector-HOME>/conf/config.xml with the new encoded password, see: https://knowledge.broadcom.com/external/article/204144/dx-oi-oiconnector-not-connecting-when-c.html


d) Search for common keywords : “Successfully", "Started", "CLIENT_SUMMARY_NASS"


NOTE
: CLIENT_SUMMARY_NASS are emitted every five minutes.

Below are some examples:

...
[INFO ] 2021-10-22 08:35:40.296 [Thread-9] NASSClient - CEFB056A-3002-4B29-A9B7-A5157EB43244: Started NASS Client.
...

INFO  [2021-10-20 09:04:42,541] [Thread-9] [PersistentRegistrationCache] - Successfully loaded 47762 metric registrations from /opt/CA/OIConnector/conf/MetricRegistrationCache-CEFB056A-3002-4B29-A9B7-A5157EB43244.ser

[INFO ] 2021-10-22 09:07:46.321 [pool-3-thread-38] RemoteDataConnectionImpl - [EVENT UNSPECIFIED Anonymous:@unknown -> /NetOps OI Connector/com.ca.im.oinet.connector.sources.RemoteDataConnectionImpl] JARVIS_INGEST_RECORD_COUNT : 471

..
[INFO ] 2021-10-22 09:07:49.855 [pool-3-thread-39] TASGroupTask - [EVENT SUCCESS Anonymous:@unknown -> /NetOps OI Connector/com.ca.im.oinet.connector.task.group.TASGroupTask] Successfully ingested groups to TAS for CAPC tenant id: _default_

..
[INFO ] 2021-10-22 09:10:40.304 [NASSClientStats] NASSClient - CEFB056A-3002-4B29-A9B7-A5157EB43244:  CLIENT_SUMMARY_NASS_INGEST_SUCCESS_COUNT: 45449
[INFO ] 2021-10-22 09:10:40.304 [NASSClientStats] NASSClient - CEFB056A-3002-4B29-A9B7-A5157EB43244:  CLIENT_SUMMARY_NASS_INGEST_FAILED_COUNT: 0
[INFO ] 2021-10-22 09:10:40.304 [NASSClientStats] NASSClient - CEFB056A-3002-4B29-A9B7-A5157EB43244:  CLIENT_SUMMARY_NASS_INGEST_RETRIED_COUNT: 0
[INFO ] 2021-10-22 09:10:40.305 [NASSClientStats] NASSClient - CEFB056A-3002-4B29-A9B7-A5157EB43244:  CLIENT_SUMMARY_NASS_REGISTRATION_SUCCESS_COUNT: 662
[INFO ] 2021-10-22 09:10:40.305 [NASSClientStats] NASSClient - CEFB056A-3002-4B29-A9B7-A5157EB43244:  CLIENT_SUMMARY_NASS_REGISTRATION_FAILED_COUNT: 0
...


[INFO ] 2021-10-22 09:11:20.696 [pool-3-thread-4] InventoryTaskImpl - [EVENT SUCCESS Anonymous:@unknown -> /NetOps OI Connector/com.ca.im.oinet.connector.task.inventory.InventoryTaskImpl] Successfully ingested inventory (268 vertices) in 0 batches TAS for CAPC tenant id : _default_

 

CHECK#3 : Check Kafka (missing NetOps PM metrics)

1) A quick way to find out  that metrics are getting ingested into DX OI is by checking that MetricRegistrationCache-<Tenant-ID>.ser exist in the conf folder

Check for the file creation in the OIConnector log:
..
INFO  [2021-10-20 09:04:42,541] [Thread-9] [PersistentRegistrationCache] - Successfully loaded 47762 metric registrations from /opt/CA/OIConnector/conf/MetricRegistrationCache-CEFB056A-3002-4B29-A9B7.ser


2) Use the below steps to debug a metric ingestion problem from Data Aggregator to DX OI :

2.1) Go to Data Aggregator (DA): check the settings in $KARAF_HOME/etc/kafkaexport.producer.cfg file are correct:

feature.enabled=on
producer.bootstrap.servers=<kafkabroker:port>

For more information refer to the documentation : https://techdocs.broadcom.com/us/en/ca-enterprise-software/it-operations-management/performance-management/21-2/integrating/integrate-with-dx-operational-intelligence/install-the-oi-connector.html#concept.dita_577458ae-0372-4fe5-addb-51330456748d_Install


2.2) Check for ‘ProducerStatisticsMonitor’ in DA’s $KARAF_HOME/data/log/karaf.log file. These are emitted every five minutes by default.

If ProducerStatisticsMonitor shows that messages are being dropped, look in DA’s $KARAF_HOME/data/log/KafkaClient.log file for errors/hints to the problem.
If ProducerStatisticsMonitor logs are not seen, check whether export configuration has been set up and applied to devices:

a) In a browser, open http(s)://<DAHOST>:<DAPORT>/debug
b) Click on Available Spring Containers (by bundle)
c) Provide PC admin credentials if prompted
d) Click on com.ca.im.data-manager.core.aggregator.loader.integrator bundle link
e) Click on exportProfileCache link



f) Verify that there is an ExportProfileConfig defined and it has the expected exportedMetricFamilyQNames.

g) Verify that the exportedDeviceCout (sic) is non-zero.

Here is an example illustrating a problem during installation, export configuration was not setup correctly:

h) If there is no ExportProfileConfig or there are no exported devices associated

- Check DataAggregator(DA)’s karaf.log for possible data corruption, if possible restart DA
- Check OIConnector logs for failures in creating the config or associating it with collections

 

2.3) Verify that the messages are truly getting to Kafka topic. On the Kafka broker (based on default standalone kafka): 

cd <oi-connector-kafka>/kafkadisk/bin
./kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic metric-export
./kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic metric-export --from=beginning

If no data is flowing, then check kafka/zookeeper logs for potential problems.

If data is flowing, then check the OIConnector.log file for logs containing “CLIENT_SUMMARY’, which are emitted every five minutes.
If any show failures, enable DEUB logging for more details.

 

CHECK#5 : Check Alarms, Metrics and Topology data from DX OI UI


a) Metrics (NASS)

Go to Performance:

In SaaS, 21.x 

In 20.2.x on premise:




b) Inventory and Topology (TAS)

Go to DX OI > Services > Create a new Service

From Add Elements,, select Network > Device Names, you should be able to see your NetOps devices, below an example:

 

CHECK#6 : Check the Alarms, Metrics, Topology data using Elastic and TAS/NAS REST APIs

** This section is valid for DX On Premise 20.x, 21.x versions only, if you are using DX OI SaaS, contact Broadcom Support for assistance **


a) Alarms(ElasticSearch)


For details how to query elasticsearch refer to : https://knowledge.broadcom.com/external/article/207215


1) List all the UIM product indices:

http://es.<servername>/_cat/indices/*capm*?v

For example:

http://es.munqa001493.bpc.broadcom.net/_cat/indices/*capm*?v

Check that doc.count and size columns values increases over the time.


2) Check the content of a specific index:

http://es.<severname>/<index-name>/_search?pretty&[email protected]:desc&size=500

For example:

http://es.munqa001493.bpc.broadcom.net/ao_itoa_groups_capm_1_1/_search?pretty&[email protected]:desc&size=500


You can use https://www.epochconverter.com/ to convert values from @timestamp field to human-readable format, 

 

 

b) Inventory and Topology (TAS)

Option 1: Use DX Dashboard > AIOps Inventory source, see:  https://knowledge.broadcom.com/external/article/226599

 

Option 2: User REST APIs:

Open Postman (you can download postman from https://www.postman.com/downloads/)

POST API End Point to check TAS data for UIM inventory: 

http://<APMServices Gateway Host>/tas/graph/query

For example:

http://apmservices-gateway.munqa001493.bpc.broadcom.net/tas/graph/query

Headers:

Content-Type: application/json

Authorization: Bearer <Tenant Token>

Body:

  {
   "filter": {
       "op": "JOIN",
       "input": {
           "op": "AND",
           "input": [
               {
                   "op": "ATTRIBUTE",
                   "expressions": [
                       {
                           "name": "Product",
                           "values": [
                               "CAPC"
                           ]
                       }
                   ]
               }
           ]
       }
   },
   "universe": null,
   "version": null,
   "time": 0,
   "stitchingEnabled": true,
   "includeStatus": true
}

Expected Result: you  should see all new vertices added to TAS

c) Metrics(NASS)

Option 1: Use DX Dashboard > AIOps Metadata source

 

Option 2: User REST APIs:

Open Postman (you can download postman from https://www.postman.com/downloads/)

POST API End Point to check NASS Metric Metadata matching a pattern

http://<APM Service Gateway Host>/metadata/queryMetric

For example:

http://apmservices-gateway.munqa001493.bpc.broadcom.net/metadata/queryMetric

Headers:

Content-Type: application/json

Authorization: Bearer <Tenant Token>

Body:

{
   "size": 10000,
 "specifier": {
   "op": "SPEC",
   "sourceNameSpecifier": {
     "op": "REGEX",
     "pattern": "(.*)NetOps\\|CAPM(.*)|(.*)NetOps\\|ADA(.*)|(.*)NetOps\\|NFA(.*)"
   },
   "attributeNameSpecifier": {
     "op": "ALL"
   }
 }
}

Expected Result: you should see all new vertices added to NAS

CHECK#7 : Verify Jarvis, Elastic, Zookeeper and Kafka

** This section is valid for DX On Premise 20.x, 21.x versions only, if you are using DX OI SaaS, contact Broadcom Support for assistance **

AIOps - Jarvis (kafka, zookeeper, elasticSearch) Troubleshooting

 


C) WHAT FILES SHOULD I COLLECT FOR BROADCOM SUPPORT?

If you still need assistance, contact Broadcom Support (https://support.broadcom.com/) and provide the below information:

a) DEBUG oi_connector logs

<OIConnector>/logs/*
<OIConnector>/conf/config.xml

b) services status:

service caperfcenter_oiconnector status
service caperfcenter_oiagent status

c) from data aggregator

$KARAF_HOME/etc/kafkaexport.producer.cfg
$KARAF_HOME/data/log/karaf.log file
$KARAF_HOME/data/log/KafkaClient.log

screenshot of exportProfileCache content

d) from kafka 

Result of:

cd <oi-connector-kafka>/kafkadisk/bin
./kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic metric-export

If you are using DX OI On Premise (20.x, 21.x):

a) cluster and pods status:

kubectl get pods -n<namespace>
kubectl describe nodes -n<namespace>
kubectl get events -n<namespace>

b) from ElasticSearch

- collect result of below queries:

http(s)://{es_endpoint}/_cat/indices/*capm*?v
http(s)://{es_endpoint}/_cat/indices/?v&s=ss:desc&h=health,store.size,pri.store.size,pri,rep,store.size,pri.store.size,docs.count,docs.deleted,index,cds
http(s)://{es_endpoint}/_cluster/health?pretty&human

Fore more details refer to : https://knowledge.broadcom.com/external/article/207215

- result of : df -h

c) from NFS server

- result of : df -h

 

Additional Information

https://knowledge.broadcom.com/external/article/190815/aiops-troubleshooting-common-issues-and.html

Youtube DX Operational Intelligence Integrations with NetOps

 

Attachments