We rebooted a device that is currently being monitored in CA Spectrum and we realized that we did not receive an alarm in Spectrum for the time that the device was down. What could be the reason for this?
NOTE: Starting from DX NetOps Spectrum 21.2.4, the default root password for MySql is "MySqlR00t". For DX NetOps Spectrum versions prior to 21.2.4, the default root password is "root". In the following MySql commands, replace <PASSWD> with the root password for your DX NetOps Spectrum version.
There could be a few different reasons why an alarm was not raised on the device if it went down or was rebooted. Some of the reasons include:
1. SpectroSERVER performance problems -- SNMP communications are backlogged due to SpectroSERVER performance issues. If you suspect a performance issue please open a case with support so we can analyze with the Perfcollector9 script.
2. Device is a non-polling device (such as a proxy or was in maintenance mode during the time)
3. Device had alarms suppressed because device caused a Trap Storm. To determine if this is the case:
Query for Trap Storm event (0x10253):
A. Log into the SpectroSERVER system as the user that owns the Spectrum installation
B. If on Windows, start a bash shell by running "bash -login"
C. cd to the $SPECROOT/mysql/bin directory and enter the following command to log into mysql:
./mysql --defaults-file=../my-spectrum.cnf -uroot -p<PASSWD> ddmdb
D. Run the following MySQL query with the time and dates before and after the device went down.
SELECT hex(model_h), count(*) as c from ddmdb.event where utime > UNIX_TIMESTAMP('2018-08-31 00:00:01') and utime < UNIX_TIMESTAMP('2018-08-31 23:59:59') and type=66131 group by hex(model_h) order by c desc;
If any devices are listed please check the model handle to determine if it is the same as the device that did not alarm.
4. Polling Time is longer then reboot time - For example if polling is set for 300 seconds (default) and the device reboots in 180 seconds, it is possible that Spectrum would not see the device down, however in this case a reboot alarm should be created because we would know that the SysUpTime (0x10245) and snmpEngineBoots (0x230c52) are less then previous poll.