Jarvis Schema Registry and Kafka processes are not running because the Zookeeper process is down

book

Article ID: 194275

calendar_today

Updated On:

Products

CA App Experience Analytics

Issue/Introduction

The Jarvis Schema Registry and Kafka are not running. Trying to restart them Jarvis using ./startServices -j causes the Jarvis Schema Registry to start but it dies after only a few seconds.
Starting Kafka using ./startServices -k restarts Kafka but also for only a few seconds.

./healthCheck.sh shows that the following processes are X--DOWN--X

Jarvis Schema Registry
Kafka Server

The Zookeeper Server shows --RUNNING-> but has no pid.

All other processes are running.

The schema-registry.log shows a timeout trying to connect to the zookeeper process:

[2020-06-25 14:30:57,297] ERROR Server died unexpectedly:  (io.confluent.kafka.schemaregistry.rest.SchemaRegistryMain:51)
org.I0Itec.zkclient.exception.ZkTimeoutException: Unable to connect to zookeeper server 'vPIC481S0135:2181' with timeout of 30000 ms

The kafka log shows a similar error:

[2020-06-25 14:30:59,447] FATAL Fatal error during KafkaServerStartable startup. Prepare to shutdown (kafka.server.KafkaServerStartable)
org.I0Itec.zkclient.exception.ZkTimeoutException: Unable to connect to zookeeper server within timeout: 6000

The zookeeper.log shows that the process has failed:

[2020-06-25 14:30:42,530] ERROR Unexpected exception, exiting abnormally (org.apache.zookeeper.server.ZooKeeperServerMain)
java.io.EOFException

Cause

This is most likely a known Apache ZooKeeper bug:
https://stackoverflow.com/questions/44217654/how-to-recover-zookeeper-from-java-io-eofexception-after-a-server-crash

The root cause of the problem is that a log file from a prior run of ZooKeeper was written with an incomplete header.

Environment

Application Experience Analytics (AXA) 17.3.2

Resolution

The workaround is to delete the offending log file:
Check the log.xxx files under the zookeeper data folder, /opt/ca/aoPlatform/jarvis/kafka_2.11-0.10.1.0/kafka_data/zookeeper/version-2, to see if there is any file with a size of 0 bytes. If there is, delete the file and restart the AXA processes.