Spectrum SRM alarminfo table not updating for one landscape
search cancel

Spectrum SRM alarminfo table not updating for one landscape

book

Article ID: 397670

calendar_today

Updated On:

Products

Network Observability Spectrum

Issue/Introduction

The alarminfo table in our SRM implementation has not updated for one of our 14 active/enabled landscapes in over a year. All other 13 landscapes are updating as expected. The following mysql output shows this, the last set_time in the alarminfo table for 0x3100000 is 2024-01-17 22:13:03.
For info 0x3100000 is an active landscape with 2,549 devices monitored.

mysql> select max(set_time) max_time, hex(landscape_h) from alarminfo where set_time > '2023-01-01' group by landscape_h order by max_time;
+---------------------+------------------+
| max_time            | hex(landscape_h) |
+---------------------+------------------+
| 2024-01-17 22:13:03 | 3100000          |
| 2025-02-28 22:00:47 | 6000000          |
| 2025-03-05 20:59:13 | 1300000          |
| 2025-03-05 21:25:34 | 900000           |
| 2025-03-06 09:50:04 | 3200000          |
| 2025-03-06 10:15:14 | 400000           |
| 2025-03-06 10:25:08 | 3800000          |
| 2025-03-06 10:34:02 | 3600000          |
| 2025-03-06 10:35:47 | 3300000          |
| 2025-03-06 10:45:56 | 3400000          |
| 2025-03-06 10:55:32 | 5000000          |
| 2025-03-06 11:09:07 | 2200000          |
| 2025-03-06 11:12:51 | D00000           |
| 2025-03-06 11:14:32 | 3500000          |
+---------------------+------------------+
14 rows in set (1 min 56.26 sec)


The event table contains up to date information for all 14 landscapes.

mysql> select max(time) max_time, hex(landscape_h) from event where time > '2023-01-01' group by landscape_h order by max_time;
+---------------------+------------------+
| max_time            | hex(landscape_h) |
+---------------------+------------------+
| 2025-03-01 08:06:44 | 6000000          |
| 2025-03-06 10:40:47 | 3800000          |
| 2025-03-06 10:45:11 | 1300000          |
| 2025-03-06 10:50:05 | 3400000          |
| 2025-03-06 11:08:23 | 3300000          |
| 2025-03-06 11:11:23 | 5000000          |
| 2025-03-06 11:13:13 | D00000           |
| 2025-03-06 11:14:17 | 3100000          |
| 2025-03-06 11:15:32 | 3500000          |
| 2025-03-06 11:20:32 | 2200000          |
| 2025-03-06 11:25:32 | 3200000          |
| 2025-03-06 11:28:21 | 400000           |
| 2025-03-06 11:30:01 | 900000           |
| 2025-03-06 11:37:53 | 3600000          |
+---------------------+------------------+
14 rows in set (0.67 sec)

 

Something that may be relevant is that I have noticed what appears to be a backlog of alarmbucket entries in the bucketactivitylog table for the affected landscape, that correspond to the period of no entries in the alarminfo table.

mysql> select min(creation_time), max(creation_time),count(*) from bucketactivitylog where landscape_h = 0x3100000 and bucket_table_name like "alarmbucket%" and destroy_time is null;
+---------------------+---------------------+----------+
| min(creation_time)  | max(creation_time)  | count(*) |
+---------------------+---------------------+----------+
| 2024-01-17 22:17:10 | 2025-03-06 11:40:46 |    10139 |
+---------------------+---------------------+----------+
1 row in set (0.65 sec)


I have tried disabling and reenabling the landscape on the SRM Spectrum Status Admin page as well as restarting tomcat and rebooting the SRM OneClick server, but the alarminfo table is still not updating. The SpectroServer for that landscape has been restarted many times in the last year.

Environment

any supported Spectrum release

Cause

catalina.out

2025-05-06 12:50:19,193 [SRM/AlarmHandler51380224/bucketReader] WARN  com.aprisma.errorlog -  (SRM_Alarms) - (ERROR) - Unknown exception encountered while processing name events:  processing halted for all servers

Caused by: org.springframework.jdbc.UncategorizedSQLException: PreparedStatementCallback; uncategorized SQLException for SQL [SELECT event_ID, time, event_seq, model_key, vardata, type, model_h, domain, event_key FROM alarmbucket51380224_1705529830284 WHERE event_seq BETWEEN ? AND ?  ORDER BY event_seq]; SQL state [HY000]; error code [130]; Incorrect file format 'alarmbucket51380224_1705529830284'; nested exception is java.sql.SQLException: Incorrect file format 'alarmbucket51380224_1705529830284'
    at org.springframework.jdbc.core.JdbcTemplate.translateException(JdbcTemplate.java:1578)
    at org.springframework.jdbc.core.JdbcTemplate.execute(JdbcTemplate.java:669)
    at org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:715)
    at org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:746)
    at org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:752)
    at com.aprisma.spectrum.app.repmgr.dc.db.dao.jdbc.JdbcEventBucketDAO.findEvents(JdbcEventBucketDAO.java:101)
    at com.aprisma.spectrum.app.repmgr.dc.event.handler.AbstractBucketReader.readBucket(AbstractBucketReader.java:120)
    at com.aprisma.spectrum.app.repmgr.dc.event.handler.BucketReaderScheduler$ScheduledReader.run(BucketReaderScheduler.java:118)
    at com.aprisma.util.thread.JobQueue.runJobThread(JobQueue.java:221)
    at com.aprisma.util.thread.JobQueue$JobRunnable.run(JobQueue.java:24)
    at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.sql.SQLException: Incorrect file format 'alarmbucket51380224_1705529830284'

Resolution

check the following:

  1. File Doesn't Exist: The most likely scenario is that the file does not exist in the expected location.
    • Check File Path: Verify that the file alarmbucket51380224_1705529830284 exists in the location the database is looking for. 
    • Database Permissions: Ensure your application has the necessary read permissions for the file.
  2. Incorrect File Format: If the file does exist, it may be corrupted?
    • File Corruption: Examine the file for corruption. If the file is corrupted, it won't be able to be read by the database.
    • Try to read the bucket file through a simple query: select * from alarmbucket51380224_1705529830284; 

       3. Examine Schema: If the table exists, compare its schema to what your application code expects. Are the column names and data types the same? Any mismatches here could cause this error. You can check this with a simple query: DESC alarmbucket51380224_1705529830284;

       4. Examine MySQL Logs and Errors:

    • MySQL Error Log: Check your MySQL server's error log. This might contain more specific details about the error. Look for errors related to the table name or file format.

 

      Try to compare the table file permissions and its contents with other bucket files of other landscapes.

Additional Information

THIS SPECIFIC CASE CAUSE:

 

Those tables do look to be corrupt as suggested. 

Here's what I see when looking at this ...

The files exist and are of zero length. They are in the right directory an permissions look fine to me. They are the same as other similar files at that time.

 

[specadmin@<HOST> reporting]$ pwd
/opt/spectrum/mysql/data/reporting
[specadmin@<HOST> reporting]$ ls -l alarmbucket51380224_1705529830284.MY*
-rw-r----- 1 specadmin specadmin 0 Jan 17  2024 alarmbucket51380224_1705529830284.MYD
-rw-r----- 1 specadmin specadmin 0 Jan 17  2024 alarmbucket51380224_1705529830284.MYI
[specadmin@<HOST> reporting]$ ls -l genericbucket51380224_1705529836875.MY*
-rw-r----- 1 specadmin specadmin 0 Jan 17  2024 genericbucket51380224_1705529836875.MYD
-rw-r----- 1 specadmin specadmin 0 Jan 17  2024 genericbucket51380224_1705529836875.MYI
[specadmin@<HOST> reporting]$


Looking at the tables in mysql they look fine, but there is an error on the select statement.

 

mysql> describe alarmbucket51380224_1705529830284;
+-----------+-----------------+------+-----+---------+----------------+
| Field     | Type            | Null | Key | Default | Extra          |
+-----------+-----------------+------+-----+---------+----------------+
| event_seq | bigint unsigned | NO   | PRI | NULL    | auto_increment |
| event_key | bigint unsigned | NO   |     | NULL    |                |
| event_ID  | binary(26)      | NO   | UNI | NULL    |                |
| domain    | int unsigned    | NO   | MUL | NULL    |                |
| model_key | int unsigned    | NO   |     | NULL    |                |
| model_h   | int unsigned    | NO   |     | NULL    |                |
| time      | datetime        | NO   | MUL | NULL    |                |
| type      | int unsigned    | NO   | MUL | NULL    |                |
| vardata   | blob            | YES  |     | NULL    |                |
+-----------+-----------------+------+-----+---------+----------------+
9 rows in set (0.00 sec)

mysql> select count(*) from alarmbucket51380224_1705529830284;
ERROR 130 (HY000): Incorrect file format 'alarmbucket51380224_1705529830284'
mysql> describe genericbucket51380224_1705529836875;
+-----------+-----------------+------+-----+---------+----------------+
| Field     | Type            | Null | Key | Default | Extra          |
+-----------+-----------------+------+-----+---------+----------------+
| event_seq | bigint unsigned | NO   | PRI | NULL    | auto_increment |
| event_key | bigint unsigned | NO   |     | NULL    |                |
| event_ID  | binary(26)      | NO   | UNI | NULL    |                |
| domain    | int unsigned    | NO   | MUL | NULL    |                |
| model_key | int unsigned    | NO   |     | NULL    |                |
| model_h   | int unsigned    | NO   |     | NULL    |                |
| time      | datetime        | NO   | MUL | NULL    |                |
| type      | int unsigned    | NO   | MUL | NULL    |                |
| vardata   | blob            | YES  |     | NULL    |                |
+-----------+-----------------+------+-----+---------+----------------+
9 rows in set (0.00 sec)

mysql> select count(*) from genericbucket51380224_1705529836875;
ERROR 130 (HY000): Incorrect file format 'genericbucket51380224_1705529836875'
mysql>

 

RESOLUTION:

To be safe take a backup of db or at least the bucketactivitylog table.

ref:
Useful commands for MySQL dumps

and then run a delete query on the bucketactivitylog table for those corrupted file names:

Delete from bucketactivitylog where <column-name> = <corrupted file name>;

recycle tomcat and validate with SRM Alarm Handler Debug enabled.