1.Checklist
IMPORTANT: Kafka nodes/brokers should always be connected to zookeeper
1) Check if kafka brokers are connected to zookeeper
Connect to a zookeeper pod: kubectl exec -ti jarvis-zookeeper-0 sh -n<dxi-namespace>
For example:
kubectl exec -ti jarvis-zookeeper-0 sh -ndxi -- bash
cd /opt/ca/zookeeper/bin
./zkCli.sh
ls /brokers/ids
Expected results: It displays the number of kafka brokers connected to zookeeper
If you have a medium elastic deployment, the result should be: [0, 1, 2] as below:
If you have a single elastic deployment and you see only 1 broker:
Recommendations:
a) Check all kafka pods are up and running, if you have 3 elastic nodes, you should have3 kafka pods.
kubectl get pods -n<namespace> | grep kafka
b) Restart the problematic kafka pods:
kubectl delete po <kafka pod> -n<namespace>
2) Check the zookeeper logs, search for: ERROR or WARN
kubectl logs <zookeeper-pod> -n<namespace>
For example:
kubectl logs jarvis-zookeeper-0 -ndxi
Here is an example when the zooKeeper disk write duration exceeds 1s:
WARN [SyncThread:3:FileTxnLog@338] - fsync-ing the write ahead log in SyncThread:3 took 16313ms which will adversely effect operation latency. See the ZooKeeper troubleshooting guide
2.What to collect if the problem persist?
If after applying the above checks and recommendations the problem persist, collect the below logs and contact Broadcom Support:
<NFS>/jarvis/api/<jarvis-apis-pod>/*.log
<NFS>/jarvis/indexer/<jarvis-lean-indexer-pod>/*.log
<NFS>/jarvis/kafka-logs/kafka-<#>/*.log
<NFS>/jarvis/esutils/*.log
kubectl logs jarvis-zookeeper-0 -ndxi