Jarvis Schema Registry and Kafka processes are not running because the Zookeeper process is down
search cancel

Jarvis Schema Registry and Kafka processes are not running because the Zookeeper process is down

book

Article ID: 194275

calendar_today

Updated On:

Products

CA App Experience Analytics

Issue/Introduction

The Jarvis Schema Registry and Kafka are not running. Trying to restart them Jarvis using ./startServices -j causes the Jarvis Schema Registry to start but it dies after only a few seconds.
Starting Kafka using ./startServices -k restarts Kafka but also for only a few seconds.

./healthCheck.sh shows that the following processes are X--DOWN--X

Jarvis Schema Registry
Kafka Server

The Zookeeper Server shows --RUNNING-> but has no pid.

All other processes are running.

The schema-registry.log shows a timeout trying to connect to the zookeeper process:

[2020-06-25 14:30:57,297] ERROR Server died unexpectedly:  (io.confluent.kafka.schemaregistry.rest.SchemaRegistryMain:51)
org.I0Itec.zkclient.exception.ZkTimeoutException: Unable to connect to zookeeper server 'vPIC481S0135:2181' with timeout of 30000 ms

The kafka log shows a similar error:

[2020-06-25 14:30:59,447] FATAL Fatal error during KafkaServerStartable startup. Prepare to shutdown (kafka.server.KafkaServerStartable)
org.I0Itec.zkclient.exception.ZkTimeoutException: Unable to connect to zookeeper server within timeout: 6000

The zookeeper.log shows that the process has failed:

[2020-06-25 14:30:42,530] ERROR Unexpected exception, exiting abnormally (org.apache.zookeeper.server.ZooKeeperServerMain)
java.io.EOFException

Environment

Application Experience Analytics (AXA) 17.3.2

Cause

This is most likely a known Apache ZooKeeper bug:
https://stackoverflow.com/questions/44217654/how-to-recover-zookeeper-from-java-io-eofexception-after-a-server-crash

The root cause of the problem is that a log file from a prior run of ZooKeeper was written with an incomplete header.

Resolution

The workaround is to delete the offending log file:
Check the log.xxx files under the zookeeper data folder, /opt/ca/aoPlatform/jarvis/kafka_2.11-0.10.1.0/kafka_data/zookeeper/version-2, to see if there is any file with a size of 0 bytes. If there is, delete the file and restart the AXA processes.