Intermittently, we are receiving alarms for devices that are in active maintenance mode.
Alarms from devices correctly added to an active maintenance schedule are still coming through.
The alarm dev_id is listed in the schedule for suppression, but still, we are receiving some alarms unexpectedly.
A possible cause of this issue can be that the maintenance_mode probe was not reachable at the time the issue occurred. Therefore the NAS was not able to register with maintenance_mode.
This caused the NAS to discard the maintenance schedule.
The NAS collects maintenance schedules from the maintenance_mode probe "at run time" and this resulted in an alarm leak for that period.
The cause for maintenance_mode probe not reachable could be due to network or other DB issues.
Something that may indicate a DB issue could be verified in the maintenace_mode log:
maintence_mode register failure:
Example:
logs at Apr 12 00:30:16:588 WARN / SQLServerException
Exception started at:
Apr 12 00:30:16:588 WARN [attach_socket, com.nimsoft.monitor.probe.MaintenanceModeProbe] Failure registering to maintenance_mode. org.springframework.dao.DataAccessResourceFailureException: StatementCallback
Exception continued till
Apr 12 00:52:03:871 WARN [attach_socket, com.nimsoft.monitor.probe.MaintenanceModeProbe] Failure registering to maintenance_mode. org.springframework.dao.DataAccessResourceFailureException
A new key was introduced that may help overcome/workaround this issue in case of a similar scenario.
The new parameter "maint_sched_discard" is available that lets you decide whether you want to discard the maintenance schedule.
You can specify the value as yes or no. A value of no implies that the maintenance schedule will be retained.
The value is found under the nas' setup section via raw configure.
The default is->
maint_sched_discard = yes
Setting it to no:
maint_sched_discard = no
The maintenance mode schedules won't be discarded if maintenance_mode is not reachable in a similar scenario.
Make the following adjustments to the nas and ems probes:
Run raw configure on the nas probe and set the following under 'setup':
maint_max_resp_time = 50
registrationIntervalLookAheadMinutes = 60
Run raw configure on the ems probe and set the following under 'setup':
maintenance_mode_cmd_timeout = 300000
The new parameter requires a minimum NAS version 9.32 but nas 9.32HF1 is recommended.