Symptoms:
- New alarms don't appear in OI Console
- ServiceNow tickets creation is not working
DX Operational Intelligence 20.x
DX Application Performance Management 20.x
DX AXA 20.x
1) Zookeeper Communication issues with Kafka brokers
WARN [SyncThread:3:FileTxnLog@338] - fsync-ing the write ahead log in SyncThread:3 took 16313ms which will adversely effect operation latency. See the ZooKeeper troubleshooting guide
2) Several Jarvis services in red
3) Elastic unassigned shards
4) Kafka consumers disconnection
- LAG
Applied below recommendations as describe in KB : DX Platform - Jarvis (kafka, zookeeper, elasticSearch) Troubleshooting
1) Fixed unassigned elastic shards and jarvis services in red by executing:
_cluster/reroute?retry_failed=true
2) Fixed LAG in kaka by performing below steps:
Restart jarvis services as below:
Scale down:
- jarvis-verifier
- jarvis-lean-jarvis-indexer
- jarvis-indexer
Scale up:
- jarvis-verifier
- jarvis-lean-jarvis-indexer
- jarvis-indexer
Below is the list of kubectl commands :
a) Scale down the following deployments:
kubectl scale --replicas=0 deployment jarvis-verifier -n<namespace>
kubectl scale --replicas=0 deployment jarvis-lean-jarvis-indexer -n<namespace>
kubectl scale --replicas=0 deployment jarvis-indexer -n<namespace>
b) Verify that all pods are down:
kubectl get pods -n<namespace> | egrep "jarvis-verifier|jarvis-lean|jarvis-indexer"
c) Scale up deployments
kubectl scale --replicas=1 deployment jarvis-verifier -n<namespace>
kubectl scale --replicas=1 deployment jarvis-lean-jarvis-indexer -n<namespace>
kubectl scale --replicas=1 deployment jarvis-indexer -n<namespace>d
d) Verify that all pods are up and running:
kubectl get pods -n<namespace> | egrep "jarvis-verifier|jarvis-lean|jarvis-indexer"
e) Verify alarms and servicenow incidents are reported as expected