We noticed the Data Collector was down, but when we try to start the services the dcmd services comes down shortly after, unable to connect to the Data Aggregator
All Supported Releases of DX NetOps Performance Management
One or more queues on the Data Aggregator was backed up and as such the Data Aggregator is rejecting the Data Collector's attempts to connect in
1. Check service status on the Data Collector:
systemctl status dcmd
systemctl status activemq
ps -ef | grep IMData
2. Ensure all processes are stopped. If the ICMPDaemon
is still running after dcmd and activemq were stopped
kill -9 <pid>
where <pid> is the process id of the running ICMPDaemon
3. Clean the dcmd caches:
/opt/IMDataCollector/scripts/dcmd clean
4. On the DA, check the queue's of the downed Data Collector:
/opt/IMDataAggregator/scripts/activemqstat | grep -i <Hostname of DC>
5. Looking at the first column of the results, utilize purgeOneQueue to clean all that have a value > 0
/opt/IMDataAggregator/scripts/purgeOneQueue "queue_name_from_activemqstat"
Ex:
/opt/IMDataAggregator/scripts/purgeOneQueue "jms-lock.dc.<DC_Hostname>:########-####-####-####-#############"
6. Back on the Data Collector, start services
systemctl start dcmd
7. Tail the karaf.log and monitor and follow the startup of the Data Collector service:
tail -f /opt/IMDataAggregator/apache-karaf/data/log/karaf.log