Kafka service is failing on platform node 1.
When checking the service health on platform node 1 using the command:
./run_all.sh sudo /home/ubuntu/check-service-health.sh -p -d
All services report "running" or "running and healthy" except for the Kafka services, which displays the status:
Problem: Kafka javaservice is not running.
instead of:
Kafka is running
Note: VCF Operations for Networks was formerly named Aria Operations for Networks (AON), and prior to that was named vRealize Network Insight (vRNI).
Aria Operations for Networks 6.13
Aria Operations for Networks 6.14
Aria Operations for Networks 6.14.1
Replication offset checkpoint used by the Kafka service has a malformed the file format.
Upon examination of the platform node 1 /var/log/arkin/kafka/kafka.log, we see the following:
java.io.IOException: Malformed line in checkpoint file (/var/lib/kafka/kafka-logs/replication-offset-checkpoint): '52 197063175'
indicating that the replication offset checkpoint is malformed.
Deleting the replication offset checkpoint file will allow the kafka service to recreate the file upon restart, allowing the service to resume functioning correctly.
support user.ub
./run_all.sh sudo /home/ubuntu/check-service-health.sh --uptime
You should see the following line in the results:
Problem: Kafka javaservice is not running.
cd /var/lib/kafka/kafka-logs
ls -lrth
You should see a line similar to the following on the list for "replication-offset-checkpoint":-rw-r--r-- 1 ubuntu ubuntu 17 Mar 15 04:39 replication-offset-checkpoint
cp replication-offset-checkpoint replication-offset-checkpoint.bak
ls -lrth
You should now see the original file named "replication-offset-checkpoint" and a new file named "replication-offset-checkpoint.bak":
-rw-r--r-- 1 ubuntu ubuntu 17 Mar 15 04:39 replication-offset-checkpoint-rw-r--r-- 1 ubuntu ubuntu 17 Mar 15 04:39 replication-offset-checkpoint.bak
sudo rm -rf utatstdb23188f replication-offset-checkpoint
ls -lrth
You should now only see the "replication-offset-checkpoint.bak" file listed:
-rw-r--r-- 1 ubuntu ubuntu 17 Mar 15 04:39 replication-offset-checkpoint.bak
./run_all.sh sudo systemctl stop kafka.serviceYou may only see a list of your platform nodes, or you may see more details showing the kafka service has been stopped.
./run_all.sh sudo systemctl start kafka.service
You should see a list of your platform nodes.
ls-lrth
You should see 2 files on the list again like:
-rw-r--r-- 1 ubuntu ubuntu 17 Mar 15 04:39 replication-offset-checkpoint-rw-r--r-- 1 ubuntu ubuntu 17 Mar 15 04:39 replication-offset-checkpoint.bak
Kafka is runningUptime: 00:01:05
If you see the error "Grid processing stopped since kafka cluster is not available" in the GUI after you have completed the repair of the kafka service, please open a support case with Broadcom Support and refer to this KB article. For more information, see Creating and managing Broadcom support cases.