The QuorumPeerMain javaservice service fails to start on Platform Node 3 in a multi-node cluster.
Upon execution of command ./run_all.sh df -hsudo /home/ubuntu/check-service-health.sh -p -d you see that all other services are running and healthy, as seen in the example below:
--platform3--ElasticSearch is running and healthy.ElasticSearch statistics:Uptime:5-21:26:25HRegionServer is running and healthy.Uptime:19:15Kafka is running and healthy.Kafka statistics:Uptime:5-21:25:58NodeManager is running and healthy.Uptime:5-21:26:19SaasListener is runningUptime:22:30:40Restapilayer is running and healthy.Restapilayer statistics:Uptime:22:30:50TSDB is runningTSDB statistics:Uptime:22:30:51DataNode is running and healthy.Uptime:5-21:26:22Launcher is runningUptime:22:30:34VIPService is running and healthy.VIPService statistics:Uptime:5-21:24:09DatabusGateway is running and healthy.DatabusGateway statistics:Uptime:22:28:31FlinkContainer is running and healthy.FlinkContainer statistics:Uptime:04:4704:46HMaster is running and healthy.Is Master:FalseUptime:18:54Problem: QuorumPeerMain javaservice is not running.JournalNode is running and healthy.Uptime:5-21:26:49Nginx is running and healthy.Nginx statistics:Uptime:5-21:27:11ExpressJSApp is runningUptime:5-21:27:11NTPSEC is running and healthy.Uptime:5-21:27:10FoundationDB is running and healthy.FoundationDB statistics:Uptime:22:28:5822:28:58
Reviewing the zookeeper-platform3.log reveals the following error: Unable to load database on disk The logs also indicate an epoch time mismatch, as seen in the example below:
2026-04-30 11:44:00,253 [myid:3] - ERROR [main:QuorumPeer@940] - Unable to load database on diskjava.io.IOException: The current epoch, 3dd, is older than the last zxid, 4252017623322at org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:922)at org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:890)at org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:205)zookeeper-zookeeper-server-platform3.log
VCF Operations for Networks 6.14.0
ZooKeeper metadata and snapshot corruption occurred within the /var/lib/zookeeper/version-2 directory on the affected node. This was caused by an improper manual cluster reboot performed in reverse order or an abrupt and ungraceful shutdown of the VM.
This is a known issue that requires intervention under the guidance of Broadcom Support.
If this issue is encountered, open a support case with Broadcom Support and refer to this KB article. For more information, see Creating and managing Broadcom support cases.