How to confirm a Data Collector is collecting data
search cancel

How to confirm a Data Collector is collecting data

book

Article ID: 268971

calendar_today

Updated On:

Products

Network Observability CA Performance Management

Issue/Introduction

How do determine if the Data Collector (DC) is collecting polled metric data when the Data Aggregator (DA) it connects to is down.

We'd like to know how to confirm that the DC is still collecting data even when the DC is not able to connect to the other nodes.

How to track polled metric data burn down rates, monitoring it's submission to the DA for database insertion.

Environment

All supported DX NetOps Performance Management releases

Resolution

The Data Collector (DC) will collect data if it can't connect to the Data Aggregator (DA) as long as the DC services remain running.

If the Data Collector is down and the Data Aggregator is down, no data will be cached. In that scenario the Data Aggregator will need to started before the Data Collector can start.

We can also check the DC (default path shown) /opt/IMDataCollector/apache-karaf-2.4.3/data/log/PollSummary.log file. This logs each poll cycle per Metric Family/poll group.

If you add the following config you can monitor the cache burndown when the DC is processing data to the DA. No restart is needed for org.ops4j.pax.logging.cfg file changes, they are read in by the dcmd service "on the fly".

  1. In DX Netops Performance Management 21.2.1 and earlier

    In /opt/IMDataCollector/apache-karaf-2.4.3/etc/org.ops4j.pax.logging.cfg, add:

    log4j.logger.com.ca.im.core.jms.health.JmsBrokerHealthAnalyser=DEBUG,sift
    log4j.additivity.com.ca.im.core.jms.health.JmsBrokerHealthAnalyser=false

  2. In 21.2.2 and later:

    In /opt/IMDataCollector/apache-karaf/etc/org.ops4j.pax.logging.cfg uncomment:

    #
    # JMS Health logging
    #
    log4j2.logger.JMSHealth.name = com.ca.im.core.jms.health
    log4j2.logger.JMSHealth.level = DEBUG
    log4j2.logger.JMSHealth.appenderRef.sift.ref = sift

This will create a log file named:

com.ca.im.common.core.jms.log

Under:

/opt/IMDataCollector/apache-karaf-*/data/log

Disable by commenting out the uncommented lines added to the file. Would look like this before saving the changes.

#log4j2.logger.JMSHealth.name = com.ca.im.core.jms.health
#log4j2.logger.JMSHealth.level = DEBUG
#log4j2.logger.JMSHealth.appenderRef.sift.ref = sift

This is an example where we can see:

  • Memory limit in the broker is 10MB
  • Memory usage in the broker is 1.43MB
  • Disk usage for non-persistent messages is 12.86MB
  • Disk usage for non-persistent messages is 2.47GB
  • Cached messages (pending to deliver) are 14381

2021-04-10 18:32:21,791 | DEBUG | pool-14-thread-1 | JmsBrokerHealthAnalyser          | s.health.JmsBrokerHealthAnalyser  149 | 179 - com.ca.im.common.core.jms - 20.2.9.RELEASE-542 |  | JMS Health Statistics => Memory: 1.43MB/10.00MB, Disk: 12.86MB/2.47GB, Pending: 14381 msgs, Enqueue: 0 msg/sec, Dequeue: 0 msg/sec, Delay: -100 secs, Dropped: 0 msgs

Based on the received statistics from the broker, Data Collector drops messages from the broker to control disk usage.

Data Collector establishes disk limit based on the minimum between:

  • 50% of the Data Collector JVM max heap
  • 90% of the free space in the file system corresponding to the Data Collector {java.home}

In the example above, disk limit calculation resulted in 2.47GB (in this case, 50% of the Data Collector JVM max heap won)

When Data Collector detects disk usage is higher than disk limit, it begins to drop cached messages.

Because Data Collector and broker could reside in different filesystems, Data Collector considers also if broker filesystem usage is greater than 85% to start dropping cached messages. Broker filesystem usage information arrives as part of the regular statistics.

Once the Data Aggregator is back online, cached messages are delivered (Pending=0). A quick restart of the broker releases the disk space.

2021-04-10 18:39:51,814 | DEBUG | pool-14-thread-1 | JmsBrokerHealthAnalyser          | s.health.JmsBrokerHealthAnalyser  149 | 179 - com.ca.im.common.core.jms - 20.2.9.RELEASE-542 |  | JMS Health Statistics => Memory: 0/10.00MB, Disk: 12.96MB/2.47GB, Pending: 0 msgs, Enqueue: 0 msg/sec, Dequeue: 0 msg/sec, Delay: 0 secs, Dropped: 0 msgs

Additional Information