SQLServer Probe - AgentJob Failure alarming on all failed jobs

search cancel

SQLServer Probe - AgentJob Failure alarming on all failed jobs

book

Article ID: 138076

calendar_today

Updated On:

Products

DX Unified Infrastructure Management (Nimsoft / UIM)

Issue/Introduction

We have configured SQLServer Monitoring and enabled agent_job_failure checkpoint, but this is not giving the expected alarms.

I tried with min threshold and max threshold, but the alarm count never gets reduced.

It is generating a flood of alarms and impacting UIM performance.

I have compared the alarm results with the SQL query results following the link:

KB Article-> 34961

Environment

Environment Details:

UIM Version 9.0.2
SQLServer - 5.42
Robot 7.96

Cause

The client had deployed an incorrect configuration package to the problem robots.

Resolution

It appears you have applied a configuration package to the probe. this had a default setting of unit = days that caused the issue.

Once we reset the value back to unit =

we were able to test and have this work as expected.

 <agent_job_failure>
      active = no
      send_alarm = yes
      description = Monitors failed agents jobs in defined interval (in minutes).
      qos = no
      qos_list = no
      clear_msg = failed_jobs_1
      clear_sev = clear
      scheduling = rules
      column = elapsed_time
      key = $job_id.$category_name.$rundate
      exclude_defs = yes
      include_defs = yes
      use_exclude = no
      use_include = no
      condition = <=
      samples = 1
      clear_alarms = 1
      msg_variables = $check.x;$profile.x;$instance.x;$job_id.x;$job_name.x;$category_name.x;$rundate.x;$elapsed_time.n
      interval = 5 min
      type = 2
      sql_timeout =  
      <thresholds>
         <default>
            <0>
               tagid = 0
               value = 5
               unit = days
              sev = critical
etc
etc

Feedback

thumb_up Yes

thumb_down No