Jarvis Schema Registry and Kafka processes are not running because the Zookeeper process is down


Article ID: 194275


Updated On:


CA App Experience Analytics


The Jarvis Schema Registry and Kafka are not running. Trying to restart them Jarvis using ./startServices -j causes the Jarvis Schema Registry to start but it dies after only a few seconds.
Starting Kafka using ./startServices -k restarts Kafka but also for only a few seconds.

./healthCheck.sh shows that the following processes are X--DOWN--X

Jarvis Schema Registry
Kafka Server

The Zookeeper Server shows --RUNNING-> but has no pid.

All other processes are running.

The schema-registry.log shows a timeout trying to connect to the zookeeper process:

[2020-06-25 14:30:57,297] ERROR Server died unexpectedly:  (io.confluent.kafka.schemaregistry.rest.SchemaRegistryMain:51)
org.I0Itec.zkclient.exception.ZkTimeoutException: Unable to connect to zookeeper server 'vPIC481S0135:2181' with timeout of 30000 ms

The kafka log shows a similar error:

[2020-06-25 14:30:59,447] FATAL Fatal error during KafkaServerStartable startup. Prepare to shutdown (kafka.server.KafkaServerStartable)
org.I0Itec.zkclient.exception.ZkTimeoutException: Unable to connect to zookeeper server within timeout: 6000

The zookeeper.log shows that the process has failed:

[2020-06-25 14:30:42,530] ERROR Unexpected exception, exiting abnormally (org.apache.zookeeper.server.ZooKeeperServerMain)


This is most likely a known Apache ZooKeeper bug:

The root cause of the problem is that a log file from a prior run of ZooKeeper was written with an incomplete header.


Application Experience Analytics (AXA) 17.3.2


The workaround is to delete the offending log file:
Check the log.xxx files under the zookeeper data folder, /opt/ca/aoPlatform/jarvis/kafka_2.11-, to see if there is any file with a size of 0 bytes. If there is, delete the file and restart the AXA processes.