DX OI - Integration with NetOps Troubleshooting
search cancel

DX OI - Integration with NetOps Troubleshooting

book

Article ID: 210469

calendar_today

Updated On:

Products

DX Operational Intelligence DX OI SaaS

Issue/Introduction

The following is a list of techniques and suggestions to employ when troubleshooting OI connector issues

Environment

DX NetOps OI Connector 

DX NetOps Performance Management 20.2.4 or higher

Resolution

APM Gateway  Hostname

Go to Settings> Connector Parameters > TAS Endpoint

APM Gateway  Security Token

Go to Settings > Connector Parameters > Generate Ingestion Token

Tenant ID

Go to Settings> Connector Parameters > Cohort ID

 

CHECK#2 : Check services

a) Check that the OI Connector services are up and running

service caperfcenter_oiconnector status
service caperfcenter_oiagent status
service kafka status

b) Check that the OI Connector services in NetOps console

Go to Performance Center >  Administration > System Status page
Locate the "OI Connector" section
Verify Status = Normal

 

CHECK#3 : Check OI Connector logs 

a) Review OIConnector logs : <OIConnector-HOME>/logs

- OIConnector.log : main log file
- OIAgent*.log : NFA, ADA data collection activity

b) Enable DEBUG logging:

a) OIConnector logging: <OIConnector-HOME>/conf/log4j.xml

Open ./conf/log4j.xml, change logging level from INFO to DEBUG as below:

...
<!-- ***** Root Logger definition ***** -->
    <root>
        <level value="DEBUG"/>
        <appender-ref ref="console"/>
        <appender-ref ref="complete" />
    </root>

b) OI Agent service logging: <OIConnector-HOME>/conf/agent-wrapper.conf

Uncomment the below line:

#wrapper.app.parameter.2=-Ssupport

 

You need to restart the oi OI Agent service:

service caperfcenter_oiagent restart

 

c) Example of common errors or exceptions :

USE-CASE #1 : Problem with apmservices-gateway endpoint

ERROR [pool-2-thread-3] [TASGroupTask] - [EVENT UNSPECIFIED Anonymous:null@unknown -> /com.ca.im.oinet.connector.task.group.TASGroupTask] Failed ingesting groups to TAS for CAPC tenant id : <example> Error: 503

Recommendation:

Verify that the apm-gateway endpoint is correct and available and token is correct

USE-CASE #2 : CAPM user password expired, changed or is not longer valid.

ERROR [] [WrapperSimpleAppMain] [OIIntegration] - [EVENT UNSPECIFIED Anonymous:<user>@example -> /com.ca.im.oinet.connector.OIIntegration] No response from webservice - unable to configure data sources
WARN  [WrapperSimpleAppMain] [OIIntegration] - [EVENT UNSPECIFIED Anonymous:null@unknown -> /com.ca.im.oinet.connector.OIIntegration] Unable to determine CA Performance Center version

Recommendation

Update the <OIConnector-HOME>/conf/config.xml with the new encoded password, see: https://knowledge.broadcom.com/external/article/204144/dx-oi-oiconnector-not-connecting-when-c.html


d) Search for common keywords : “Successfully", "Started", "CLIENT_SUMMARY_NASS"


NOTE
: CLIENT_SUMMARY_NASS are emitted every five minutes.

Below are some examples:

...
[INFO]  [Thread-9] NASSClient - <number>: Started NASS Client.
...

INFO]  [Thread-9] [PersistentRegistrationCache] - Successfully loaded 47762 metric registrations from /opt/CA/OIConnector/conf/MetricRegistrationCache-<number>.ser

[INFO ]  [pool-3-thread-38] RemoteDataConnectionImpl - [EVENT UNSPECIFIED <user>:@<host> -> /NetOps OI Connector/com.ca.im.oinet.connector.sources.RemoteDataConnectionImpl] JARVIS_INGEST_RECORD_COUNT : 471

..
[INFO ]  [pool-3-thread-39] TASGroupTask - [EVENT SUCCESS <user>:@<host>n -> /NetOps OI Connector/com.ca.im.oinet.connector.task.group.TASGroupTask] Successfully ingested groups to TAS for CAPC tenant id: _default_

..
[INFO ]  [NASSClientStats] NASSClient - <number>:  CLIENT_SUMMARY_NASS_INGEST_SUCCESS_COUNT: 45449
[INFO ] [NASSClientStats] NASSClient -  <number>:  CLIENT_SUMMARY_NASS_INGEST_FAILED_COUNT: 0
[INFO ]  [NASSClientStats] NASSClient - <number>:  CLIENT_SUMMARY_NASS_INGEST_RETRIED_COUNT: 0
[INFO ]  [NASSClientStats] NASSClient - <number>:  CLIENT_SUMMARY_NASS_REGISTRATION_SUCCESS_COUNT: 662
[INFO ]  [NASSClientStats] NASSClient - <number>:  CLIENT_SUMMARY_NASS_REGISTRATION_FAILED_COUNT: 0
...


[INFO ]  [pool-3-thread-4] InventoryTaskImpl - [EVENT SUCCESS <user>:@<host> -> /NetOps OI Connector/com.ca.im.oinet.connector.task.inventory.InventoryTaskImpl] Successfully ingested inventory (268 vertices) in 0 batches TAS for CAPC tenant id : _default_

 

CHECK#4 : Check Kafka (missing NetOps PM metrics)

1) A quick way to find out  that metrics are getting ingested into DX OI is by checking that MetricRegistrationCache-<Tenant-ID>.ser exist in the conf folder

Check for the file creation in the OIConnector log:
..
INFO   [Thread-9] [PersistentRegistrationCache] - Successfully loaded 47762 metric registrations from /opt/CA/OIConnector/conf/MetricRegistrationCache-<number>.ser


2) Use the below steps to debug a metric ingestion problem from Data Aggregator to DX OI :

2.1) Go to Data Aggregator (DA): check the settings in $KARAF_HOME/etc/kafkaexport.producer.cfg file are correct:

feature.enabled=on
producer.bootstrap.servers=<kafkabroker:port>


2.2) Check for ‘ProducerStatisticsMonitor’ in DA’s $KARAF_HOME/data/log/karaf.log file. These are emitted every five minutes by default.

If ProducerStatisticsMonitor shows that messages are being dropped, look in DA’s $KARAF_HOME/data/log/KafkaClient.log file for errors/hints to the problem.
If ProducerStatisticsMonitor logs are not seen, check whether export configuration has been set up and applied to devices:

a) In a browser, open http(s)://<DAHOST>:<DAPORT>/debug
b) Click on Available Spring Containers (by bundle)
c) Provide NetOps Portal admin credentials if prompted
d) Click on com.ca.im.data-manager.core.aggregator.loader.integrator bundle link
e) Click on exportProfileCache link



f) Verify that there is an ExportProfileConfig defined and it has the expected exportedMetricFamilyQNames.

g) Verify that the exportedDeviceCout (sic) is non-zero.

Here is an example illustrating a problem during installation, export configuration was not setup correctly:

h) If there is no ExportProfileConfig or there are no exported devices associated

- Check DataAggregator(DA)’s karaf.log for possible data corruption, if possible restart DA
- Check OIConnector logs for failures in creating the config or associating it with collections

 

2.3) Verify that the messages are truly getting to Kafka topic. On the Kafka broker (based on default standalone kafka): 

cd <oi-connector-kafka>/kafkadisk/bin
./kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic metric-export
./kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic metric-export --from=beginning

If no data is flowing, then check kafka/zookeeper logs for potential problems.

If data is flowing, then check the OIConnector.log file for logs containing “CLIENT_SUMMARY’, which are emitted every five minutes.
If any show failures, enable DEUB logging for more details.

 

CHECK#5 : Check Alarms, Metrics and Topology data from DX OI UI


a) Metrics (NASS)

Go to Performance:

 

b) Inventory and Topology (TAS)

Go to DX OI > Services > Create a new Service

From Add Elements,, select Network > Device Names, you should be able to see your NetOps devices, below an example:

 

CHECK#6 : Check the Alarms, Metrics, Topology data using Elastic and TAS/NAS REST APIs

** This section is valid for DX On Premise only, if you are using DX OI SaaS, contact Broadcom Support for assistance **


a) Alarms(ElasticSearch)


For details how to query elasticsearch refer to : https://knowledge.broadcom.com/external/article/207215


1) List all the UIM product indices:

http://<servername>/_cat/indices/*capm*?v

For example:

http://<host>/_cat/indices/*capm*?v

Check that doc.count and size columns values increases over the time.


2) Check the content of a specific index:

http://<severname>/<index-name>/_search?pretty&sort=@timestamp:desc&size=500

For example:

http://<host>/ao_itoa_groups_capm_1_1/_search?pretty&sort=@timestamp:desc&size=500


You can use https://www.epochconverter.com/ to convert values from @timestamp field to human-readable format, 

 

b) Inventory and Topology (TAS)

Option 1: Use DX Dashboard > AIOps Inventory source, see:  https://knowledge.broadcom.com/external/article/226599

 

Option 2: User REST APIs:

Open Postman (you can download postman from https://www.postman.com/downloads/)

POST API End Point to check TAS data for UIM inventory: 

http://<APMServices Gateway Host>/tas/graph/query

For example:

http://apmservices-gateway.<host>/tas/graph/query

Headers:

Content-Type: application/json

Authorization: Bearer <Tenant Token>

Body:

  {
   "filter": {
       "op": "JOIN",
       "input": {
           "op": "AND",
           "input": [
               {
                   "op": "ATTRIBUTE",
                   "expressions": [
                       {
                           "name": "Product",
                           "values": [
                               "CAPC"
                           ]
                       }
                   ]
               }
           ]
       }
   },
   "universe": null,
   "version": null,
   "time": 0,
   "stitchingEnabled": true,
   "includeStatus": true
}

Expected Result: you  should see all new vertices added to TAS

c) Metrics(NASS)

Option 1: Use DX Dashboard > AIOps Metadata source

 

Option 2: User REST APIs:

Open Postman (you can download postman from https://www.postman.com/downloads/)

POST API End Point to check NASS Metric Metadata matching a pattern

http://<APM Service Gateway Host>/metadata/queryMetric

For example:

http://apmservices-gateway.<host>/metadata/queryMetric

Headers:

Content-Type: application/json

Authorization: Bearer <Tenant Token>

Body:

{
   "size": 10000,
 "specifier": {
   "op": "SPEC",
   "sourceNameSpecifier": {
     "op": "REGEX",
     "pattern": "(.*)NetOps\\|CAPM(.*)|(.*)NetOps\\|ADA(.*)|(.*)NetOps\\|NFA(.*)"
   },
   "attributeNameSpecifier": {
     "op": "ALL"
   }
 }
}

Expected Result: you should see all new vertices added to NAS

CHECK#7 : Verify Jarvis, Elastic, Zookeeper and Kafka

** This section is valid for DX On Premise only, if you are using DX OI SaaS, contact Broadcom Support for assistance **

AIOps - Jarvis (kafka, zookeeper, elasticSearch) Troubleshooting

 


C) WHAT FILES SHOULD I COLLECT FOR BROADCOM SUPPORT?

If you still need assistance, contact Broadcom Support (https://support.broadcom.com/) and provide the below information:

a) DEBUG oi_connector logs

<OIConnector>/logs/*
<OIConnector>/conf/config.xml

b) services status:

service caperfcenter_oiconnector status
service caperfcenter_oiagent status

c) from data aggregator

$KARAF_HOME/etc/kafkaexport.producer.cfg
$KARAF_HOME/data/log/karaf.log file
$KARAF_HOME/data/log/KafkaClient.log

screenshot of exportProfileCache content

d) from kafka 

Result of:

cd <oi-connector-kafka>/kafkadisk/bin
./kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic metric-export

If you are using DX OI On Premise :

a) collect cluster and pods status:

kubectl get pods -n<namespace>
kubectl describe nodes -n<namespace>
kubectl get events -n<namespace>

b) collect result of ElasticSearch queries:

- collect result of below queries:

http(s)://{es_endpoint}/_cat/indices/*capm*?v
http(s)://{es_endpoint}/_cat/indices/?v&s=ss:desc&h=health,store.size,pri.store.size,pri,rep,store.size,pri.store.size,docs.count,docs.deleted,index,cds
http(s)://{es_endpoint}/_cluster/health?pretty&human

- result of : df -h

c) from NFS server

- result of : df -h

 

Additional Information