AIOps - How to restart Jarvis services?

Products

DX Operational Intelligence CA App Experience Analytics DX Application Performance Management

Issue/Introduction

If Jarvis services are not performing as expected you might required to restart the core jarvis services.

Below a list of some of the symptoms that can help you identify this unexpected condition:

- Several Jarvis services are is red status
https://knowledge.broadcom.com/external/article/189119#mcetoc_1f87sn3bfb

- kafka kafka-consumer-groups.sh is reporting contienous LAG processing the messages:
https://knowledge.broadcom.com/external/article/189119#mcetoc_1f87sn3bfe

- zookeeper disconnect from kafka
https://knowledge.broadcom.com/external/article/189119#mcetoc_1f87sn3bfd

Environment

DX Platform 20.x
DX Operational Intelligence 20.x
DX Application Performance Management 20.x
DX AXA 20.x

Resolution

Restart jarvis services as below:

Scale down:
- all jarvis-kafka
- all zookeeper
- jarvis-verifier
- jarvis-lean-jarvis-indexer
- jarvis-indexer
- jarvis-kron
- jarvis-esutils

Scale up:
- all zookeeper
- all jarvis-kafka
- jarvis-verifier
- jarvis-lean-jarvis-indexer
- jarvis-indexer
- jarvis-kron
- jarvis-esutils

Below is the list of kubectl commands in case you have 3 Elastic Nodes:

1) Scale down the following deployments:

kubectl scale --replicas=0 deployment jarvis-kafka -n<namespace>
kubectl scale --replicas=0 deployment jarvis-kafka-2 -n<namespace>
kubectl scale --replicas=0 deployment jarvis-kafka-3 -n<namespace>
kubectl scale --replicas=0 deployment jarvis-zookeeper -n<namespace>
kubectl scale --replicas=0 deployment jarvis-zookeeper-2 -n<namespace>
kubectl scale --replicas=0 deployment jarvis-zookeeper-3 -n<namespace>
kubectl scale --replicas=0 deployment jarvis-verifier -n<namespace>
kubectl scale --replicas=0 deployment jarvis-lean-jarvis-indexer -n<namespace>
kubectl scale --replicas=0 deployment jarvis-indexer -n<namespace>
kubectl scale --replicas=0 deployment jarvis-kron -n<namespace>
kubectl scale --replicas=0 deployment jarvis-esutils -n<namespace>

2) Verify that all pods are down:

3) Start pods in below order

a) scale up zookeeper pods, review the pod logs using " kubectl logs <pod-name> -n<namespace> " and verify that no errors are reported

kubectl scale --replicas=1 deployment jarvis-zookeeper -n<namespace>
kubectl scale --replicas=1 deployment jarvis-zookeeper-2 -n<namespace>
kubectl scale --replicas=1 deployment jarvis-zookeeper-3 -n<namespace>

b) Scale up kafka pods, review the pod logs

kubectl scale --replicas=1 deployment jarvis-kafka -n<namespace>
kubectl scale --replicas=1 deployment jarvis-kafka-2 -n<namespace>
kubectl scale --replicas=1 deployment jarvis-kafka-3 -n<namespace>

c) Start the below pods, Review the pod logs

kubectl scale --replicas=1 deployment jarvis-verifier -n<namespace>
kubectl scale --replicas=1 deployment jarvis-lean-jarvis-indexer -n<namespace>
kubectl scale --replicas=1 deployment jarvis-indexer -n<namespace>
kubectl scale --replicas=1 deployment jarvis-kron -n<namespace>
kubectl scale --replicas=1 deployment jarvis-esutils -n<namespace>

4) Verify that all pods are up and running:

5) Verification

Check that all Jarvis Services are in green : http(s)://<APIS_URL>/#/All/get_health

https://knowledge.broadcom.com/external/article/189119#mcetoc_1f87sn3bfb

Additional Information

DX AIOPs - Troubleshooting, Common Issues and Best Practices