The following is a list of techniques and suggestions to employ when troubleshooting OI connector issues
DX NetOps OI Connector
DX NetOps Performance Management 20.2.4 or higher
a) Check compatibility
b) Check OI values entered during installation are correct:
APM Gateway Hostname |
Go to Settings> Connector Parameters > TAS Endpoint |
APM Gateway Security Token |
Go to Settings > Connector Parameters > Generate Ingestion Token |
Tenant ID |
Go to Settings> Connector Parameters > Cohort ID |
a) Check that the OI Connector services are up and running
service caperfcenter_oiconnector status
service caperfcenter_oiagent status
service kafka status
b) Check that the OI Connector services in NetOps console
Go to Performance Center > Administration > System Status page
Locate the "OI Connector" section
Verify Status = Normal
a) Review OIConnector logs : <OIConnector-HOME>/logs
- OIConnector.log : main log file
- OIAgent*.log : NFA, ADA data collection activity
b) Enable DEBUG logging:
a) OIConnector logging: <OIConnector-HOME>/conf/log4j.xml
Open ./conf/log4j.xml, change logging level from INFO to DEBUG as below:
...
<!-- ***** Root Logger definition ***** -->
<root>
<level value="DEBUG"/>
<appender-ref ref="console"/>
<appender-ref ref="complete" />
</root>
b) OI Agent service logging: <OIConnector-HOME>/conf/agent-wrapper.conf
Uncomment the below line:
#wrapper.app.parameter.2=-Ssupport
You need to restart the oi OI Agent service:
service caperfcenter_oiagent restart
c) Example of common errors or exceptions :
USE-CASE #1 : Problem with apmservices-gateway endpoint
ERROR [pool-2-thread-3] [TASGroupTask] - [EVENT UNSPECIFIED Anonymous:null@unknown -> /com.ca.im.oinet.connector.task.group.TASGroupTask] Failed ingesting groups to TAS for CAPC tenant id : <example> Error: 503
Recommendation:
Verify that the apm-gateway endpoint is correct and available and token is correct
USE-CASE #2 : CAPM user password expired, changed or is not longer valid.
ERROR [] [WrapperSimpleAppMain] [OIIntegration] - [EVENT UNSPECIFIED Anonymous:<user>@example -> /com.ca.im.oinet.connector.OIIntegration] No response from webservice - unable to configure data sources
WARN [WrapperSimpleAppMain] [OIIntegration] - [EVENT UNSPECIFIED Anonymous:null@unknown -> /com.ca.im.oinet.connector.OIIntegration] Unable to determine CA Performance Center version
Recommendation:
Update the <OIConnector-HOME>/conf/config.xml with the new encoded password, see: https://knowledge.broadcom.com/external/article/204144/dx-oi-oiconnector-not-connecting-when-c.html
d) Search for common keywords : “Successfully", "Started", "CLIENT_SUMMARY_NASS"
NOTE: CLIENT_SUMMARY_NASS are emitted every five minutes.
Below are some examples:
...
[INFO] [Thread-9] NASSClient - <number>: Started NASS Client.
...
INFO] [Thread-9] [PersistentRegistrationCache] - Successfully loaded 47762 metric registrations from /opt/CA/OIConnector/conf/MetricRegistrationCache-<number>.ser
[INFO ] [pool-3-thread-38] RemoteDataConnectionImpl - [EVENT UNSPECIFIED <user>:@<host> -> /NetOps OI Connector/com.ca.im.oinet.connector.sources.RemoteDataConnectionImpl] JARVIS_INGEST_RECORD_COUNT : 471
..
[INFO ] [pool-3-thread-39] TASGroupTask - [EVENT SUCCESS <user>:@<host>n -> /NetOps OI Connector/com.ca.im.oinet.connector.task.group.TASGroupTask] Successfully ingested groups to TAS for CAPC tenant id: _default_
..
[INFO ] [NASSClientStats] NASSClient - <number>: CLIENT_SUMMARY_NASS_INGEST_SUCCESS_COUNT: 45449
[INFO ] [NASSClientStats] NASSClient - <number>: CLIENT_SUMMARY_NASS_INGEST_FAILED_COUNT: 0
[INFO ] [NASSClientStats] NASSClient - <number>: CLIENT_SUMMARY_NASS_INGEST_RETRIED_COUNT: 0
[INFO ] [NASSClientStats] NASSClient - <number>: CLIENT_SUMMARY_NASS_REGISTRATION_SUCCESS_COUNT: 662
[INFO ] [NASSClientStats] NASSClient - <number>: CLIENT_SUMMARY_NASS_REGISTRATION_FAILED_COUNT: 0
...
[INFO ] [pool-3-thread-4] InventoryTaskImpl - [EVENT SUCCESS <user>:@<host> -> /NetOps OI Connector/com.ca.im.oinet.connector.task.inventory.InventoryTaskImpl] Successfully ingested inventory (268 vertices) in 0 batches TAS for CAPC tenant id : _default_
1) A quick way to find out that metrics are getting ingested into DX OI is by checking that MetricRegistrationCache-<Tenant-ID>.ser exist in the conf folder
Check for the file creation in the OIConnector log:
..
INFO [Thread-9] [PersistentRegistrationCache] - Successfully loaded 47762 metric registrations from /opt/CA/OIConnector/conf/MetricRegistrationCache-<number>.ser
2) Use the below steps to debug a metric ingestion problem from Data Aggregator to DX OI :
2.1) Go to Data Aggregator (DA): check the settings in $KARAF_HOME/etc/kafkaexport.producer.cfg file are correct:
feature.enabled=on
producer.bootstrap.servers=<kafkabroker:port>
2.2) Check for ‘ProducerStatisticsMonitor’ in DA’s $KARAF_HOME/data/log/karaf.log file. These are emitted every five minutes by default.
If ProducerStatisticsMonitor shows that messages are being dropped, look in DA’s $KARAF_HOME/data/log/KafkaClient.log file for errors/hints to the problem.
If ProducerStatisticsMonitor logs are not seen, check whether export configuration has been set up and applied to devices:
a) In a browser, open http(s)://<DAHOST>:<DAPORT>/debug
b) Click on Available Spring Containers (by bundle)
c) Provide NetOps Portal admin credentials if prompted
d) Click on com.ca.im.data-manager.core.aggregator.loader.integrator bundle link
e) Click on exportProfileCache link
f) Verify that there is an ExportProfileConfig defined and it has the expected exportedMetricFamilyQNames.
g) Verify that the exportedDeviceCout (sic) is non-zero.
Here is an example illustrating a problem during installation, export configuration was not setup correctly:
h) If there is no ExportProfileConfig or there are no exported devices associated
- Check DataAggregator(DA)’s karaf.log for possible data corruption, if possible restart DA
- Check OIConnector logs for failures in creating the config or associating it with collections
2.3) Verify that the messages are truly getting to Kafka topic. On the Kafka broker (based on default standalone kafka):
cd <oi-connector-kafka>/kafkadisk/bin
./kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic metric-export
./kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic metric-export --from=beginning
If no data is flowing, then check kafka/zookeeper logs for potential problems.
If data is flowing, then check the OIConnector.log file for logs containing “CLIENT_SUMMARY’, which are emitted every five minutes.
If any show failures, enable DEUB logging for more details.
a) Metrics (NASS)
Go to Performance:
b) Inventory and Topology (TAS)
Go to DX OI > Services > Create a new Service
From Add Elements,, select Network > Device Names, you should be able to see your NetOps devices, below an example:
** This section is valid for DX On Premise only, if you are using DX OI SaaS, contact Broadcom Support for assistance **
a) Alarms(ElasticSearch)
For details how to query elasticsearch refer to : https://knowledge.broadcom.com/external/article/207215
1) List all the UIM product indices:
http://<servername>/_cat/indices/*capm*?v
For example:
http://<host>/_cat/indices/*capm*?v
Check that doc.count and size columns values increases over the time.
2) Check the content of a specific index:
http://<severname>/<index-name>/_search?pretty&sort=@timestamp:desc&size=500
For example:
http://<host>/ao_itoa_groups_capm_1_1/_search?pretty&sort=@timestamp:desc&size=500
You can use https://www.epochconverter.com/ to convert values from @timestamp field to human-readable format,
b) Inventory and Topology (TAS)
Option 1: Use DX Dashboard > AIOps Inventory source, see: https://knowledge.broadcom.com/external/article/226599
Option 2: User REST APIs:
Open Postman (you can download postman from https://www.postman.com/downloads/)
POST API End Point to check TAS data for UIM inventory:
http://<APMServices Gateway Host>/tas/graph/query
For example:
http://apmservices-gateway.<host>/tas/graph/query
Headers:
Content-Type: application/json
Authorization: Bearer <Tenant Token>
Body:
{
"filter": {
"op": "JOIN",
"input": {
"op": "AND",
"input": [
{
"op": "ATTRIBUTE",
"expressions": [
{
"name": "Product",
"values": [
"CAPC"
]
}
]
}
]
}
},
"universe": null,
"version": null,
"time": 0,
"stitchingEnabled": true,
"includeStatus": true
}
Expected Result: you should see all new vertices added to TAS
c) Metrics(NASS)
Option 1: Use DX Dashboard > AIOps Metadata source
Option 2: User REST APIs:
Open Postman (you can download postman from https://www.postman.com/downloads/)
POST API End Point to check NASS Metric Metadata matching a pattern
http://<APM Service Gateway Host>/metadata/queryMetric
For example:
http://apmservices-gateway.<host>/metadata/queryMetric
Headers:
Content-Type: application/json
Authorization: Bearer <Tenant Token>
Body:
{
"size": 10000,
"specifier": {
"op": "SPEC",
"sourceNameSpecifier": {
"op": "REGEX",
"pattern": "(.*)NetOps\\|CAPM(.*)|(.*)NetOps\\|ADA(.*)|(.*)NetOps\\|NFA(.*)"
},
"attributeNameSpecifier": {
"op": "ALL"
}
}
}
Expected Result: you should see all new vertices added to NAS
** This section is valid for DX On Premise only, if you are using DX OI SaaS, contact Broadcom Support for assistance **
AIOps - Jarvis (kafka, zookeeper, elasticSearch) Troubleshooting
C) WHAT FILES SHOULD I COLLECT FOR BROADCOM SUPPORT?
If you still need assistance, contact Broadcom Support (https://support.broadcom.com/) and provide the below information:
a) DEBUG oi_connector logs
<OIConnector>/logs/*
<OIConnector>/conf/config.xml
b) services status:
service caperfcenter_oiconnector status
service caperfcenter_oiagent status
c) from data aggregator
$KARAF_HOME/etc/kafkaexport.producer.cfg
$KARAF_HOME/data/log/karaf.log file
$KARAF_HOME/data/log/KafkaClient.log
screenshot of exportProfileCache content
d) from kafka
Result of:
cd <oi-connector-kafka>/kafkadisk/bin
./kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic metric-export
If you are using DX OI On Premise :
a) collect cluster and pods status:
kubectl get pods -n<namespace>
kubectl describe nodes -n<namespace>
kubectl get events -n<namespace>
b) collect result of ElasticSearch queries:
- collect result of below queries:
http(s)://{es_endpoint}/_cat/indices/*capm*?v
http(s)://{es_endpoint}/_cat/indices/?v&s=ss:desc&h=health,store.size,pri.store.size,pri,rep,store.size,pri.store.size,docs.count,docs.deleted,index,cds
http(s)://{es_endpoint}/_cluster/health?pretty&human
- result of : df -h
c) from NFS server
- result of : df -h