Since the beginning of this week we regularly get the following alarm REPORT MANAGER for the SRM: GENERIC EVENT PROCESSING FAILURE at the time 4:50 AM
This requires our on-call person to become unnecessarily active and be rung out of bed.
To solve the acute problem at this time, we restarted the Tomcat Process each time.
Release : 21.2
The tomcat error is
Apr 22, 2022 02:40:57.911 (SRM/ModelCreateDestroyHandler/bucketReader) (SRM_Model_Create_Destroy_Events) - (ERROR) - Unknown exception encountered while processing name events: processing halted for all servers
Caused by: org.springframework.dao.CannotAcquireLockException: StatementCallback; SQL [UPDATE modeloutage SET last_updated_time = CURRENT_TIMESTAMP where modeloutage.model_key = 2955549]; Lock wait timeout exceeded; try restarting transaction; nested exception is java.sql.SQLException: Lock wait timeout exceeded; try restarting transaction
The modeloutage file has a size of about 62 GB and somehow the purge seems not to work properly, as per Reporting Preferences. 180 days are configured but over 400 are now stored.
Running an optimize of the reporting db, model outage size was reduced and no more errors were seen.
$ mysqlcheck -uroot -pMySqlR00t reporting --optimize
The table size by 66%.