search cancel

Data Collector showing down and unable to connect

book

Article ID: 199943

calendar_today

Updated On:

Products

CA Infrastructure Management CA Performance Management - Usage and Administration DX NetOps

Issue/Introduction

Data Collector showing down and unable to connect and the following errors display in the Data Collector logs:

2020-09-21 10:01:04,402 | ERROR | Checkpoint failed | org.apache.activemq.store.kahadb.MessageDatabase | ActiveMQ Journal Checkpoint Worker
java.lang.OutOfMemoryError: Java heap space

2020-09-21 10:01:04,402 | INFO  | Ignoring no space left exception, java.io.IOException: Java heap space | org.apache.activemq.util.DefaultIOExceptionHandler | ActiveMQ Journal Checkpoint Worker
java.io.IOException: Java heap space
 at org.apache.activemq.util.IOExceptionSupport.create(IOExceptionSupport.java:40)[activemq-client-5.15.8.jar:5.15.8]
 at org.apache.activemq.store.kahadb.MessageDatabase$CheckpointRunner.run(MessageDatabase.java:449)[activemq-kahadb-store-5.15.8.jar:5.15.8]
 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)[:1.8.0_222]
 at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)[:1.8.0_222]
 at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)[:1.8.0_222]
 at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)[:1.8.0_222]
 at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)[:1.8.0_222]
 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)[:1.8.0_222]
 at java.lang.Thread.run(Thread.java:748)[:1.8.0_222]

2020-09-20 11:51:21,497 | ERROR | raf-2.4.3/deploy | fileinstall                      | ?                                   ? | 6 - org.apache.felix.fileinstall - 3.5.0 |  | In main loop, we have serious trouble
java.lang.OutOfMemoryError: Java heap space

Cause

Java has run out of memory

Environment

Release : 3.7

Component : IM Reporting / Admin / Configuration

Resolution

In the short term:

1. Stop DCMD: systemctl stop dcmd

2. Stop ActiveMQ: systemctl stop activemq

3. Ensure the processes are no longer running: ps -ef | grep apache

4. Start DCMD: systemctl start dcmd

In the longer term and if the issue replicates shortly after the above:

1. Add additional memory to the Data Collector