Polls dropped due to time shift on data collector.
search cancel

Polls dropped due to time shift on data collector.

book

Article ID: 140886

calendar_today

Updated On:

Products

CA Performance Management - Usage and Administration DX NetOps

Issue/Introduction

We stopped seeing data on devices for specific metric families, and it appears no polls are being sent for those metrics for the devices. When we enable Detailed Poll Logging we see nothing for the problem device(s) but the Detailed SNMP Logging shows request/response pairs.

We also see one or more of the following:

  1. In the (default path shown) /opt/IMDataCollector/apache-karaf-*/data/log/karaf.log file we see messages such as:

    YYYY-MM-DD HH:MM:SS,### | ERROR | l 60000-thread-1 | PollerScheduledExecutor          | r.common.PollerScheduledExecutor  290 | 191 - com.ca.im.data-collection-manager.core.common - #.#.#.RELEASE-# |  | Executor Scheduler # for poll interval ##### for poll Cycle :  EPOCHTIME (WEEKDAY MONTH DD HH:MM:SS TIMEZONE YEAR)  dropped poll requests=#

  2. Searching the same log for "KahaDBFileMonitor" references, we should see a log entry occur every two minutes with a change in timing at the same time the error above begins.
    • Scan the karaf.log (or karaf.log.1, karaf.log.2, etc by replacing karaf.log with the appropriate filename) using:

      • prevdate= && awk -F, '/Number of Kaha DB files/ {print $1}' karaf.log | while IFS= read -r tstamp; do curdate=$(date --date="$tstamp" "+%s"); [ -n "$prevdate" ] && delta=$((curdate - prevdate)) && [ "$delta" -ne 120 ] && printf "%s to %s: %d Seconds\n" "$prevtstamp" "$tstamp" "$delta"; prevdate=$curdate; prevtstamp=$tstamp; done

    • This will print the timestamps where the KahaDBFileMonitor are not 120 seconds apart.

      • NOTE: If the log includes a Data Collector restart you may see a delta other than 120 seconds which can be verified by reviewing the log.

  3. Looking in the OS system logs may show a message around the same time regarding time or clock changes.
    • Something similar to the following may be observed in the /var/log/messages if enabled.

      Nov 15 03:01:01 DCHostName systemd: Starting Session 152 of user root.
      Nov 15 03:11:51 DCHostName systemd: Time has been changed
      Nov 15 03:11:53 DCHostName chronyd[890]: Forward time jump detected!
      Nov 15 03:11:53 DCHostName chronyd[890]: Can't synchronise: no selectable sources
      Nov 15 03:15:06 DCHostName chronyd[890]: Selected source <DC_IP>
      Nov 15 03:25:17 DCHostName systemd: Removed slice user-0.slice.

Environment

All supported DX NetOps Performance Management releases

Cause

The Data Collector (DC) will not send a poll request if it is determined to be "late".

This can happen if the DC system time moves ahead by at least 90 seconds (if DC is polling items at 1 minute) or by at least 450 seconds (if DC is polling items at 5 minutes).

Resolution

Ensure the Data Aggregator and Data Collector remain synchronized for time.

If this condition is seen a Data Collector restart will temporarily resolve the problem. Once a new time change is detected the problem will return.

If the problem takes place frequently, determine and resolve the environmental issue causing it to fully resolve the problem.

Additional Information

In releases 3.7.4+ there is a system event generated when this condition is detected. From the 3.7.4 release notes:

  • Symptom: The Data Collector will not send a poll request if it is determined to by "late".  This can happen if the DC system time moves ahead by at least 90 seconds (if DC is polling items at 1 minute) or by at least 450 seconds (if DC is polling items at 5 minutes).
  • Resolution: The Data Aggregator will generate events indicating that a Data Collector is having this issue, and also generate events on devices for which poll requests are dropped.
  • (3.7.4, DE426204) 

In 3.7.4+ releases when this condition is detected you would see Events generated on the Data Aggregator device similar to:

  • Item Name: Data Aggregator
  • Item Type Name: Device
  • Item SubTypeName: IM Data Aggregator
  • Event Type: Administration Event
  • Event Sub Type: Data Collector status
  • Description: Data Collector DCNAME:DCUUID dropped 1 of 2 scheduled poll requests for Metric Family METRICFAMILY, Vendor Cert VENDORCERTIFICSTION for Poll Rate POLLRATE. The Data Collector may be overloaded or the system time has changed.

These are automated system events. No action is needed to raise them when the problem conditions are detected.