DX OI - New alarms don't appear in OI Console and ServiceNow incident creation is not working

book

Article ID: 221752

calendar_today

Updated On:

Products

DX Operational Intelligence

Issue/Introduction

Symptoms:

- New alarms don't appear in OI Console
- ServiceNow tickets creation is not working

Cause

1) Zookeeper Communication issues with Kafka brokers

WARN  [SyncThread:3:[email protected]] - fsync-ing the write ahead log in SyncThread:3 took 16313ms which will adversely effect operation latency. See the ZooKeeper troubleshooting guide

2) Several Jarvis services in red

3) Elastic unassigned shards

4) Kafka consumers disconnection

- LAG 

Environment

DX Operational Intelligence 20.x
DX Application Performance Management 20.x
DX AXA 20.x

Resolution

Applied below recommendations as describe in KB : DX Platform - Jarvis (kafka, zookeeper, elasticSearch) Troubleshooting

1) Fixed unassigned elastic shards and jarvis services in red by executing:

_cluster/reroute?retry_failed=true

2) Fixed LAG in kaka by performing below steps:

Restart jarvis services as below:

Scale down:
- jarvis-verifier                
- jarvis-lean-jarvis-indexer 
- jarvis-indexer

Scale up:
- jarvis-verifier                
- jarvis-lean-jarvis-indexer 
- jarvis-indexer

Below is the list of kubectl commands :

a) Scale down the following deployments:

kubectl scale --replicas=0 deployment jarvis-verifier -n<namespace>
kubectl scale --replicas=0 deployment jarvis-lean-jarvis-indexer  -n<namespace>
kubectl scale --replicas=0 deployment jarvis-indexer -n<namespace>

b) Verify that all pods are down:

kubectl get pods -n<namespace> | egrep "jarvis-verifier|jarvis-lean|jarvis-indexer"

c) Scale up deployments 

kubectl scale --replicas=1 deployment jarvis-verifier -n<namespace>
kubectl scale --replicas=1 deployment jarvis-lean-jarvis-indexer -n<namespace>
kubectl scale --replicas=1 deployment jarvis-indexer -n<namespace>d

d) Verify that all pods are up and running:

kubectl get pods -n<namespace> | egrep "jarvis-verifier|jarvis-lean|jarvis-indexer"

e) Verify alarms and servicenow incidents are reported as expected

Additional Information

DX AIOPs Troubleshooting, Common Issues and Best Practices

Attachments