reboot events repeating

book

Article ID: 127812

calendar_today

Updated On:

Products

CA Infrastructure Management CA Infrastructure Management CA Performance Management - Usage and Administration

Issue/Introduction

I noticed that the same devices keep on displaying Rebooted events every poll cycle.

These devices have not been rebooted.
 
Event Type = reconfiguration
Event SubType = rebooted
Description = A device reboot was detected during this poll period.  The device last restarted x:xx:xx.xx ago.
 

Cause

The Availability Metric Family polls for sysuptime and expects the values to be in time ticks.
If the difference between the previousSysUpTime and the currentSysUpTime is not equal to the timeTicksSincePreviousPoll we will generate a reboot event.

Enable detailed poll logging for the problem device and wait at least 1 poll cycle or until the event occurs then collect all logs for this IP.
Check for this message:

date: The device rebooted, but we are not dropping the response because this is the Availability poll response: previousSysUpTime=42537758, currentSysUpTime=42538058, timeTicksSincePreviousPoll=30000, response=SnmpPollResponse [itemID=454340, deviceItemId=454340, internetAddress=x.x.x.x, pollGroupId=4090, cycleTimestamp=1550763000000
  readTimestamp=1550763229444, duration=300000, pollRate=-1, error=SUCCESS, errorIndex=-1, rowData=[
            SnmpResponseVariable [oid=1.3.6.1.2.1.1.3.0, type=TIME_TICKS, value=42538058, isDelta=true, isList=false, error=SUCCESS, isDynamicIndex=false, indexList=null]
..
previousSysUpTime=42537758
           
currentSysUpTime=42538058
           
timeTicksSincePreviousPoll=30000
           
actual difference = 300

 

PM only allows 1000 timeticks offset between TicksSincePreviousPoll and the actual diff. So it appears the DC is correctly advancing time by 5 minutes of timeticks, but the device is not keeping up with advancing timeticks for the same amount of real-time. There is no way to adjust the 1000 timeticks difference allowed.

Note:  ( prevSysUpTime + timeTicksSincePrevious - REBOOT_TOLERANCE ) > currentSysUpTime

So it's the previous sysUptime we read, plus a number of timeticks DC has advanced minus a 1000 timeticks.  If that value > currentSysUptime read in a new poll, we think the device rebooted.

So the issue is because SysUpTime on the device isn't moving fast enough on the device in relation to real-time.

 

Environment

CAPC 3.x on linux

Resolution

Device is reporting the wrong value for sysuptime.
There are 2 options:
  1. Contact the vendor and find out why timeTicks is in 1/1000 second instead of 1/100 second.
  2. Create a group and add all devices except the problem ones.

Remove the Availability Monitoring Profile (MP) from the All Manageable Devices collection and apply it to your custom group.
 

Additional Information

https://comm.support.ca.com/kb/detailed-poll-logging-by-ip/kb000074009