Incorrect Threshold Violation event time (future time)

Products

CA Performance Management - Usage and Administration DX NetOps

Issue/Introduction

According to the attached image, the alarms coming from the PM have the wrong time.

All machines have the same date/time and timezone settings.
Sectrum 21.2.1 in Windows platform
PM 21.2.1

In Spectrum, the alarm was generated on 14/04/2022 8h58min4s GMT-03:00 (AM BRT)
In the Alarms details, the Threshold Violation event occurred on 14/04/2022 12:48:00 PM BRT (this is a future time, the current local time is around 09:00 AM BRT)

Enable the Detailed Poll Logging on the DA machine (https://knowledge.broadcom.com/external/article?articleId=33163).

Apr 20 15:34:48.784: Sending response:SnmpPollResponse [itemID=22761, deviceItemId=7735, internetAddress=xxx.yyy.zzz.58, pollGroupId=20401, cycleTimestamp=1650494700000

Apr 20 15:34:48 is the actual local time when we collected the polling log. When converting the cycleTimestamp=1650494700000
via https://www.epochconverter.com/

GMT: Wednesday, April 20, 2022 10:45:00 PM
Your time zone: Wednesday, April 20, 2022 7:45:00 PM GMT-03:00

The local time was: 15:34:48
cycleTimestamp: 19:45:00

4h10min ahead

Environment

Release : 21.2

Component : NetOps Data Collector

Cause

When the DC (Data Collector) was started, the time was in the future and then the NTP brought the time back. (Or somehow when the DC was up, it got pushed ahead and then back).

The time was messed (in the future) on the machine at DC (Data Collector) startup and has since been resolved but the DC process used the bad value for the initial polling timestamp. We increment the end-of-cycle timestamp by the polling interval and don't go and get the current timestamp.

It seems the problem is the VM clock is done by the CPU and depending on CPU cycles can lead to drift. Why the VMware tools time sync I believe is there.

https://kb.vmware.com/s/article/1006427

https://kb.vmware.com/s/article/1318

In the /var/log/messages file:

Apr 22 17:05:09 DC_hostname systemd: Time has been changed
Apr 22 17:05:09 DC_hostname kernel: sd 2:0:0:0: [storvsc] Sense Key : Unit Attention [current]
Apr 22 17:05:09 DC_hostname kernel: sd 2:0:0:0: [storvsc] Add. Sense: Changed operating definition
Apr 22 17:05:09 DC_hostname kernel: sd 2:0:0:0: Warning! Received an indication that the operating parameters on this target have changed. The Linux SCSI layer does not automa
Apr 22 17:06:14 DC_hostname kernel: sd 2:0:0:0: [storvsc] Sense Key : Unit Attention [current]
Apr 22 17:06:14 DC_hostname kernel: sd 2:0:0:0: [storvsc] Add. Sense: Changed operating definition
Apr 22 17:06:14 DC_hostname kernel: sd 2:0:0:0: Warning! Received an indication that the operating parameters on this target have changed. The Linux SCSI layer does not automa
Apr 22 17:14:24 DC_hostname chronyd[684]: Backward time jump detected!
Apr 22 17:14:24 DC_hostname chronyd[684]: Can't synchronise: no selectable sources
Apr 22 17:17:37 DC_hostname chronyd[684]: Selected source xx.yy.ww.zz
Apr 22 17:17:37 DC_hostname chronyd[684]: System clock wrong by 191.295657 seconds, adjustment started
Apr 22 17:17:38 DC_hostname chronyd[684]: Selected source yy.zz.ww.xx

Jun 2 02:54:09 DC_hostname systemd: Time has been changed
Jun 2 02:54:14 DC_hostname chronyd[1025]: Forward time jump detected!
Jun 2 02:54:14 DC_hostname chronyd[1025]: Can't synchronise: no selectable sources
Jun 2 02:57:28 DC_hostname chronyd[1025]: Selected source yy.zz.ww.xx
Jun 2 02:57:28 DC_hostname chronyd[1025]: System clock wrong by -12.409403 seconds, adjustment started
Jun 2 02:57:41 DC_hostname systemd: Time has been changed
Jun 2 02:58:33 DC_hostname chronyd[1025]: Can't synchronise: no majority
Jun 2 02:58:33 DC_hostname chronyd[1025]: Selected source xx.yy.ww.zz
Jun 2 02:58:33 DC_hostname chronyd[1025]: System clock wrong by -1.087112 seconds, adjustment started

Resolution

After fixing the time sync issue on the VM, cycle the DC service:

Please do the following on the DC machine:

1) Stop the dcmd service

$ systemctl stop dcmd

2) Check if it successfully stopped

$ systemctl status dcmd

3) Ensure the Icmpd process is gone prior to restarting the dcmd service

$ ps -ef | grep -i icmpd

4) Start the dcmd service

$ systemctl start dcmd