We have configured SQLServer Monitoring and enabled agent_job_failure checkpoint, but this is not giving the expected alarms.
I tried with min threshold and max threshold, but the alarm count never gets reduced.
It is generating a flood of alarms and impacting UIM performance.
I have compared the alarm results with the SQL query results following the link:
Environment Details:
It appears you have applied a configuration package to the probe. this had a default setting of unit = days that caused the issue.
Once we reset the value back to unit =
we were able to test and have this work as expected.
<agent_job_failure>
active = no
send_alarm = yes
description = Monitors failed agents jobs in defined interval (in minutes).
qos = no
qos_list = no
clear_msg = failed_jobs_1
clear_sev = clear
scheduling = rules
column = elapsed_time
key = $job_id.$category_name.$rundate
exclude_defs = yes
include_defs = yes
use_exclude = no
use_include = no
condition = <=
samples = 1
clear_alarms = 1
msg_variables = $check.x;$profile.x;$instance.x;$job_id.x;$job_name.x;$category_name.x;$rundate.x;$elapsed_time.n
interval = 5 min
type = 2
sql_timeout =
<thresholds>
<default>
<0>
tagid = 0
value = 5
unit = days
sev = critical
etc
etc