DX AIOps - ServiceNow Integration Troubleshooting

Products

DX Operational Intelligence

Issue/Introduction

The following is a high-list of techniques and suggestions to employ when troubleshooting related to ServiceNow integration

A) Checklist

B) What files should I collect for Broadcom Support?

Environment

DX Operational Intelligence 1.3.x, 20.x

Resolution

DATA FLOW:

A) CHECKLIST

STEP #1 : Check that all relevant pods and services are up and running
STEP #2 : Check ServiceNow configuration in AIOps
STEP #3 : Check if manual ticket creation is working, use the Developer tool
STEP #4 : Check Jarvis (kafka, zookeeper, elastic) health
STEP #5 : Check Alarm in ElasticSearch indices
STEP #6 : Check Incidentmanagement log
STEP #7 : Check Nim
STEP #8 : Check doireadserver for Incident display issues

STEP #1 : Check that all relevant pods and services are up and running

To integrate with ServiceNow, ensure that the following pods are running

• doi-incidentmanagement
• doi-nim
• doi-integrationgateway
• doi-tenantmanagement
• doireadserver
• axaservices-notifier-filter

axaservices-notify-filter-756ddc876f-2v7km 1/1 Running 0 14d
doi-incidentmanagement-56b66b6bff-5qtqv 1/1 Running 0 14d
doi-incidentmanagementpollingengine-7cf4bcfb4f-v92wl 1/1 Running 0 14d
doi-integrationgateway-7df8cd5d94-xvft8 1/1 Running 0 6d
doi-nim-7df49f5f89-vb7bq 1/1 Running 0 14d
doi-tenantmanagement-5fb7c64b96-cxj8t 1/1 Running 0 14d
doireadserver-6cb959d664-lv42v 1/1 Running 0 14d

STEP #2 : Check ServiceNow configuration in AIOps

1) Go to Launch Pad | Settings | Channels | <your ServiceNow configuration>

You can use the TEST button to verify connectivity

If you get an error:

a) Check ServiceNow connectivity, it could a firewall issue.
b) Use the Browser Developer Tool to further diagnose the issue.
c) Check NIM to ServiceNow connectivity, see:

DX OI - Nim Troubleshooting
https://knowledge.broadcom.com/external/article/206252

2) Go to Launch Pad | Settings | Policies | <your Policy configuration>

Ensure the policy filter matches the service/raw alarm.

STEP #3 : Check if manual ticket creation is working, use the Developer tool

Check if you get any error or exception

..

STEP #4 : Check Jarvis (kafka, zookeeper, elastic) health

If ServiceNow incident creation is still not working, check for a possible issue with Jarvis (kafka, zookeeper, elastic)

DX OI - Jarvis / Kafka Troubleshooting, Common Issues and Best Practices
https://knowledge.broadcom.com/external/article/189119

STEP #5 : Check Alarm in ElasticSearch indices

1) Check Alarm ingestion:

Check the alarm appears in the Alarm product index: ao_itoa_alarms_<product>

For example, *alarms_uim*, adjust the query accordingly to your product, see :

DX AIOps - ElasticSearch Queries
https://knowledge.broadcom.com/external/article/207215

{ES-Endpoint}/*alarms_uim*/_search?pretty&size=100&sort=timestamp:desc&q=nimid:<alarm-ID>

For example:
{ES-Endpoint}/*alarms_uim*/_search?pretty&size=100&sort=timestamp:desc&q=nimid:PG85096332-05512

2) Check Incidents creation:

Check the alarm appears in the elastic indices in the below order:

For Raw Alarms: ao_itoa_alarms_all -> ao_itoa_channels -> ao_itoa_incidents
For Situation Alarms : ao_itoa_alarms_service_sa -> ao_itoa_channels -> ao_itoa_incidents

Below an example illustrating the troubleshooting process for Raw alarms:

a) Check the alarm appears in "alarms_all" index

http://<elastic-endpoint>/*alarms_all*/_search?pretty&q=alarm_unique_id:<your alarm id>

b) Check the alarm appears in the "channels" index :

Check the channels attribute is added with <channelName>#OnPrem_ITSM

For example:

c) Check if the alarm appears in the "incident" index:

http://<elastic-endpoint>/*incidents*/_search?pretty&q=alarm_unique_id:<your alarm>

e) Check "alarms_all" index is updated with the ServiceNow incident number.

STEP #6 : Check Incidentmanagement log

Go to Openshift console | Application | pods

If 20.2 :
Open “doi-incidentmanagement" pod > Terminal
cd /opt/caemm/normalized-alarm/logs/<doi-normalized-alarm-pod>

If 1.3.x :
Open “incidentmanagement-" pod > Terminal
cd /opt/caemm/normalized-alarm/logs

Or ssh the pod using:

If 20.2 :
kubectl get pods | grep doi-incidentmanagement
kubectl exec -ti <doi-incidentmanagement pod> sh
tail -f /incidentmanagement/incidentmanager/logs/<doi-incidentmanagement-pod>/incidentmanager.log

If 1.3.2 :
kubectl get pods | grep incidentmanagement
kubectl exec -ti <incidentmanagement pod> sh
tail -f /incidentmanagement/incidentmanager/logs/incidentmanager.log

Search for "troubleTicket" or "AlarmNotificationHandler", here is an example of the kind of output you should see:

STEP #8 : Check Nim

DX AIOps - Nim Troubleshooting
https://knowledge.broadcom.com/external/article/206252

STEP #8 : Check doireadserver for Incident display issues

See:

DX OI - Alarms do not display the generated ServiceNow incidents
https://knowledge.broadcom.com/external/article/206248

B) WHAT FILES SHOULD I COLLECT FOR BROADCOM SUPPORT?

If you still need assistance, contact Broadcom Support (https://support.broadcom.com/) and provide the below information:

a) Details of the problematic alarm(s) (if possible, provide screenshots)

b) Result from Developer tool > Network tab : when creating a ServiceNow Incident manually:

If you are using DX OI On Premise (20.x): collect the below additional information:

c) from openshift or kubernetes:

d) From ElasticSearch

Collect the result of the below queries:

Elastic Health:
http(s)://<ELASTIC_URL>/_cluster/health?pretty&human
http(s)://<ELASTIC_URL>/_nodes/stats/fs?pretty
http(s)://<ELASTIC_URL>/_nodes/stats/indices?pretty
http(s)://<ELASTIC_URL>/_cat/health?v
http(s)://<ELASTIC_URL>/_cat/nodes?v

Elastic indices:
http(s)://{ELASTIC_URL}/_cat/indices/?v&s=ss:desc&h=health,store.size,pri.store.size,pri,rep,store.size,pri.store.size,docs.count,docs.deleted,index,cds
http(s)://<ELASTIC_URL>/*alarms_all*/_search?pretty&sort=@timestamp:desc&size=200
http(s)://<ELASTIC_URL>/*alarms_service_sa*/_search?pretty&sort=@timestamp:desc&size=200
http(s)://<ELASTIC_URL>/*channels*/_search?pretty&sort=@timestamp:desc&size=200
http(s)://<ELASTIC_URL>/*incidents*/_search?pretty&sort=@timestamp:desc&size=200

e) Logs:

doi-incidentmanagement service:
<NFS>/doiservices/incidentmanagement/<doi-incidentmanagement-pod>/incidentmanger.log
<NFS>/doiservices/incidentmanagement/restservices/restservices.log

doi-incidentmanagementpollingengine service: (e.g. poll updates SNOW to OI)
<NFS>/doiservices/incidentmanagementpollingengine/logs/itsm_pollingengine.log

Nim service:
kubectl cp <doi-nim-pod>:webapps/ca-nim-sm/WEB-INF/logs/Nim.log /tmp

doireadserver service:
- kubectl logs <doireadserver-pod>
- <NFS>/doiservices/readserver/logs/ca-doi-server-log.txt

doi-integrationgateway service:
- kubectl logs <doi-integrationgateway-pod>
- <NFS>/doiservices/integrationgateway/logs/integrationGateway.log

axaservices-notify-filter service
- kubectl logs <axaservices-notify-filter-pod>

f) If the problem is related to Jarvis, Kafka or Elastic, collect the respective logs and evidences, see:

DX AIOps - Jarvis (kafka, zookeeper, elasticSearch) Troubleshooting
https://knowledge.broadcom.com/external/article/189119

Additional Information

DX AIOPs - Troubleshooting, Common Issues and Best Practices
https://knowledge.broadcom.com/external/article/190815