The following is a high-list of techniques and suggestions to employ when troubleshooting Jarvis common performance and configuration issues.
DX AIops 2x
1) Check the kafka logs, search for: ERROR or WARN
List all available topics: |
/opt/ca/kafka/bin/kafka-topics.sh --zookeeper jarvis-zookeeper:2181 --list /opt/ca/kafka/bin/kafka-topics.sh --bootstrap-server jarvis-kafka:9092,jarvis-kafka-2:9092,jarvis-kafka-3:9092 --list /opt/ca/kafka/bin/kafka-topics.sh --bootstrap-server jarvis-kafka:9092,jarvis-kafka-2:9092,jarvis-kafka-3:9092 --describe |
List all consumer groups | /opt/ca/kafka/bin/kafka-consumer-groups.sh --bootstrap-server jarvis-kafka:9092,jarvis-kafka-2:9092,jarvis-kafka-3:9092 --list |
Check for a possible LAG in jarvis Recommendation: Verify that column LAG is not always > 0 |
/opt/ca/kafka/bin/kafka-consumer-groups.sh --bootstrap-server jarvis-kafka:9092,jarvis-kafka-2:9092,jarvis-kafka-3:9092 --describe --group jarvis_indexer /opt/ca/kafka/bin/kafka-consumer-groups.sh --bootstrap-server jarvis-kafka:9092,jarvis-kafka-2:9092,jarvis-kafka-3:9092 --describe --group indexer /opt/ca/kafka/bin/kafka-consumer-groups.sh --bootstrap-server jarvis-kafka:9092,jarvis-kafka-2:9092,jarvis-kafka-3:9092 --describe --group verifier
/opt/ca/kafka/bin/kafka-consumer-groups.sh --bootstrap-server jarvis-kafka:9092,jarvis-kafka-2:9092,jarvis-kafka-3:9092 --describe --group axa.transformer |
Here is an example illustrating a LAG condition:
Here is an example illustrating the consumers disconnection condition:
Recommendation: Restart jarvis services as below:
Scale down:
- jarvis-verifier
- jarvis-lean-jarvis-indexer
- jarvis-indexer
Scale up:
- jarvis-verifier
- jarvis-lean-jarvis-indexer
- jarvis-indexer
Below is the list of kubectl commands :
a) Scale down the following deployments:
kubectl scale --replicas=0 deployment jarvis-verifier -n<namespace>
kubectl scale --replicas=0 deployment jarvis-lean-jarvis-indexer -n<namespace>
kubectl scale --replicas=0 deployment jarvis-indexer -n<namespace>
b) Verify that all pods are down:
kubectl get pods -n<namespace> | egrep "jarvis-verifier|jarvis-lean|jarvis-indexer"
c) Scale up deployments
kubectl scale --replicas=1 deployment jarvis-verifier -n<namespace>
kubectl scale --replicas=1 deployment jarvis-lean-jarvis-indexer -n<namespace>
kubectl scale --replicas=1 deployment jarvis-indexer -n<namespace>d
d) Verify that all pods are up and running:
kubectl get pods -n<namespace> | egrep "jarvis-verifier|jarvis-lean|jarvis-indexer"
e) Verify alarms and servicenow incidents are reported as expected
If after applying the above checks and recommendations the problem persist, collect the below logs and contact Broadcom Support: