How do determine if the Data Collector (DC) is collecting polled metric data when the Data Aggregator (DA) it connects to is down.
We'd like to know how to confirm that the DC is still collecting data even when the DC is not able to connect to the other nodes.
How to track polled metric data burn down rates, monitoring it's submission to the DA for database insertion.
All supported DX NetOps Performance Management releases
The Data Collector (DC) will collect data if it can't connect to the Data Aggregator (DA) as long as the DC services remain running.
If the Data Collector is down and the Data Aggregator is down, no data will be cached. In that scenario the Data Aggregator will need to started before the Data Collector can start.
We can also check the DC (default path shown) /opt/IMDataCollector/apache-karaf-2.4.3/data/log/PollSummary.log file. This logs each poll cycle per Metric Family/poll group.
If you add the following config you can monitor the cache burndown when the DC is processing data to the DA. No restart is needed for org.ops4j.pax.logging.cfg file changes, they are read in by the dcmd service "on the fly".
/opt/IMDataCollector/apache-karaf-2.4.3/etc/org.ops4j.pax.logging.cfg
, add:log4j.logger.com.ca.im.core.jms.health.JmsBrokerHealthAnalyser=DEBUG,sift
log4j.additivity.com.ca.im.core.jms.health.JmsBrokerHealthAnalyser=false
/opt/IMDataCollector/apache-karaf/etc/org.ops4j.pax.logging.cfg
uncomment:#
# JMS Health logging
#
log4j2.logger.JMSHealth.name = com.ca.im.core.jms.health
log4j2.logger.JMSHealth.level = DEBUG
log4j2.logger.JMSHealth.appenderRef.sift.ref = sift
This will create a log file named:
com.ca.im.common.core.jms.log
Under:
/opt/IMDataCollector/apache-karaf-*/data/log
Disable by commenting out the uncommented lines added to the file. Would look like this before saving the changes.
#log4j2.logger.JMSHealth.name = com.ca.im.core.jms.health
#log4j2.logger.JMSHealth.level = DEBUG
#log4j2.logger.JMSHealth.appenderRef.sift.ref = sift
This is an example where we can see:
2021-04-10 18:32:21,791 | DEBUG | pool-14-thread-1 | JmsBrokerHealthAnalyser | s.health.JmsBrokerHealthAnalyser 149 | 179 - com.ca.im.common.core.jms - 20.2.9.RELEASE-542 | | JMS Health Statistics => Memory: 1.43MB/10.00MB, Disk: 12.86MB/2.47GB, Pending: 14381 msgs, Enqueue: 0 msg/sec, Dequeue: 0 msg/sec, Delay: -100 secs, Dropped: 0 msgs
Based on the received statistics from the broker, Data Collector drops messages from the broker to control disk usage.
Data Collector establishes disk limit based on the minimum between:
In the example above, disk limit calculation resulted in 2.47GB (in this case, 50% of the Data Collector JVM max heap won)
When Data Collector detects disk usage is higher than disk limit, it begins to drop cached messages.
Because Data Collector and broker could reside in different filesystems, Data Collector considers also if broker filesystem usage is greater than 85% to start dropping cached messages. Broker filesystem usage information arrives as part of the regular statistics.
Once the Data Aggregator is back online, cached messages are delivered (Pending=0). A quick restart of the broker releases the disk space.
2021-04-10 18:39:51,814 | DEBUG | pool-14-thread-1 | JmsBrokerHealthAnalyser | s.health.JmsBrokerHealthAnalyser 149 | 179 - com.ca.im.common.core.jms - 20.2.9.RELEASE-542 | | JMS Health Statistics => Memory: 0/10.00MB, Disk: 12.96MB/2.47GB, Pending: 0 msgs, Enqueue: 0 msg/sec, Dequeue: 0 msg/sec, Delay: 0 secs, Dropped: 0 msgs