Polls dropped due to time shift on data collector.

book

Article ID: 140886

calendar_today

Updated On:

Products

CA Infrastructure Management CA Performance Management - Usage and Administration CA Performance Management - Data Polling

Issue/Introduction

We stopped seeing data on devices for specific metric families, and it appears no polls are being sent for those metrics for the devices.

In the /opt/IMDataCollector/apache-karaf-2.4.3/data/log/karaf.log we see messages such as:

YYYY-MM-DD HH:MM:SS,### | ERROR | l 60000-thread-1 | PollerScheduledExecutor          | r.common.PollerScheduledExecutor  290 | 191 - com.ca.im.data-collection-manager.core.common - #.#.#.RELEASE-# |  | Executor Scheduler # for poll interval ##### for poll Cycle :  EPOCHTIME (WEEKDAY MONTH DD HH:MM:SS TIMEZONE YEAR)  dropped poll requests=#

Searching the same log for "KahaDBFileMonitor" which returns a log entry that should occur every two minutes shows a different interval around the time the error above started.

Looking in the OS system logs may show a message around the same time regarding time or clock changes.

Cause

 The Data Collector will not send a poll request if it is determined to by "late".  This can happen if the DC system time moves ahead by at least 90 seconds (if DC is polling items at 1 minute) or by at least 450 seconds (if DC is polling items at 5 minutes).

Environment

Release : 19.3

Component : IM Polling

Resolution

In CAPM 3.7.4 and later there is an event that is generated when this is seen. From the release notes:


Symptom:

 The Data Collector will not send a poll request if it is determined to by "late".  This can happen if the DC system time moves ahead by at least 90 seconds (if DC is polling items at 1 minute) or by at least 450 seconds (if DC is polling items at 5 minutes).


Resolution:

 The Data Aggregator will generate events indicating that a Data Collector is having this issue, and also generate events on devices for which poll requests are dropped. 

(3.7.4, DE426204) 


https://techdocs.broadcom.com/content/broadcom/techdocs/us/en/ca-enterprise-software/it-operations-management/performance-management/3-7/release-notes/fixed-issues.html#concept.dita_dffccc4e37aafcca90f52a0d259b75e6551f37ff_375Fixes

====

So in 3.7.4 and later, when this is seen you would see events generated on the Data Aggregator device like:


Item Name: Data Aggregator

Item Type Name: Device

Item SubTypeName: IM Data Aggregator

Event Type: Administration Event

Event Sub Type: Data Collector status


Description:

Data Collector DCNAME:DCUUID dropped 1 of 2 scheduled poll requests for Metric Family METRICFAMILY, Vendor Cert VENDORCERTIFICSTION for Poll Rate POLLRATE. The Data Collector may be overloaded or the system time has changed.


You don't need to configure these they will automatically occur when this issue is seen so you can take corrective action.