DX AXA - No data available OR data reporting slowly

Products

CA App Experience Analytics

Issue/Introduction

No data results in AXA UIs or sometimes data appears but very slowly

Environment

DX AXA 22.x, 23.x

Cause

You can use the below steps to confirm there is LAG affecting the AXA data processing:

1) Find out kafka pod names:

kubectl get pods -n<namespace>| grep kafka

2) Check for possible LAGs:

kubectl exec -ti -n<namespace> <jarvis-kafka-pod> -- /opt/ca/kafka/bin/kafka-consumer-groups.sh --bootstrap-server jarvis-kafka:9092,jarvis-kafka-2:9092,jarvis-kafka-3:9092 --describe --group axa.transformer
kubectl exec -ti -n<namespace> <jarvis-kafka-pod> -- /opt/ca/kafka/bin/kafka-consumer-groups.sh --bootstrap-server jarvis-kafka:9092,jarvis-kafka-2:9092,jarvis-kafka-3:9092 --describe --group axa-aggregator_network_perf
kubectl exec -ti -n<namespace> <jarvis-kafka-pod> -- /opt/ca/kafka/bin/kafka-consumer-groups.sh --bootstrap-server jarvis-kafka:9092,jarvis-kafka-2:9092,jarvis-kafka-3:9092 --describe --group axa-aggregator_device_perf

Resolution

Recommendations

1) Increase CPU and Memory of axaservices-transformer and jarvis-elasticsearch deployments:

- open the deployments:

kubectl edit deployments axaservices-transformer -n<namespace>

kubectl edit deployments jarvis-elasticsearch -n<namespace>

kubectl edit deployments jarvis-elasticsearch-2 -n<namespace>

kubectl edit deployments jarvis-elasticsearch-3 -n<namespace>

...

kubectl edit deployments jarvis-elasticsearch-n -n<namespace>

- locate the resources section and double the default values, save and wait for the new pods to start.

2) Increase the # of replicas for the below AXA services:

axaservices-transformer
axaservices-indexer
axaservices-kibana-indexer
axaservices-axa-ng-aggregator

For example, to increase the # of replicas to 5, you will execute the below commands:

kubectl scale --replicas=5 deployment axaservices-transformer -n<namespace>
kubectl scale --replicas=5 deployment axaservices-indexer -n<namespace>
kubectl scale --replicas=5 deployment axaservices-kibana-indexer -n<namespace>
kubectl scale --replicas=5 deployment axaservices-axa-ng-aggregator -n<namespace>

NOTES:

a) The maximum # of Replica Sets is 5 pods (each replica has its own partition), if you need to add additional replicas, then first manually create the partitions as documented below. After partitions are created, you can proceed to increase the # of replicas:

kafka-topics.sh --alter --zookeeper <zookeeperHOST>:<ZookeeperPort> --topic <TOPIC NAME> --partitions <PartitionsCOunt>

Sample:

kafka-topics.sh --alter --zookeeper jarvis-zookeeper:2181 --topic maaBAAggregator --partitions 8

Verification:

kafka-topics.sh --describe --zookeeper jarvis-zookeeper:2181 --topic maaBAAggregator

b) The above commands are not scoped in Product documentation as it is related to open technology Kafka.
c) You must have the same number of "Partition" and "Replica Sets" for the deployments
d) Kafka Partitions are supported to increase each time but not decrease.

3) Reduce Kafka data retention from 24 hours to 1 hour

If in additional to the LAG in maaBAAggregator you notice that Kafka data is consuming disk space, then you can try reducing data retention

Verification:

cd /nfs/ca/dxi/jarvis

du -sc ./*

215188 ./apis
3279632784      ./elasticsearch
1315136 ./esutils
250332 ./indexer
0       ./jafservices
2559380580      ./kafka
3395684 ./kafka-logs
635496 ./kron
3328    ./verifier
103040 ./zookeeper
12724   ./zookeeper-logs

You can reduce kafka data retention to 1 hour for examples as below:

/opt/ca/kafka/bin/kafka-topics.sh --zookeeper jarvis-zookeeper:2181 --alter --topic maaBAAggregator --config retention.ms=3600000

4) Increase the # of ingress-nginx replicas:

You can simply scale the Nginx deployment replicas using kubectl

kubectl scale deployment <nginx-deployment-name> --replicas=5

Example:

kubectl scale deployment nginx-ingress-controller -n ingress-nginx --replicas=5

Additional Information

https://knowledge.broadcom.com/external/article/190815/aiops-troubleshooting-common-issues-and.html