SQLServer Probe - AgentJob Failure alarming on all failed jobs
search cancel

SQLServer Probe - AgentJob Failure alarming on all failed jobs

book

Article ID: 138076

calendar_today

Updated On:

Products

DX Unified Infrastructure Management (Nimsoft / UIM)

Issue/Introduction

We have configured SQLServer Monitoring and enabled agent_Job_failures checkpoint, this is not giving the expected alarms. 

I tried with min threshold and max threshold, but alarm count never getting reduced. 

It is generating flood of alarms and impacting UIM performance.

I have compared the alarm results with the SQL query results following the link:

 

https://ca-broadcom.wolkenservicedesk.com/external/article?articleId=34961

 

Environment

Environment Details:

UIM Version 9.0.2

SQLServer - 5.42

Robot 7.96

Cause

The client had deployed an incorrect configuration package to the problem robots.


Resolution

 It appears you have applied a configuration package to the probe. this had a default setting of unite = days that caused the issue. 

Once we reset the value back to unit = we were able to test and have this work as expected.

 <agent_job_failure>

      active = no

      send_alarm = yes

      description = Monitors failed agents jobs in defined interval (in minutes).

      qos = no

      qos_list = no

      clear_msg = failed_jobs_1

      clear_sev = clear

      scheduling = rules

      column = elapsed_time

      key = $job_id.$category_name.$rundate

      exclude_defs = yes

      include_defs = yes

      use_exclude = no

      use_include = no

      condition = <=

      samples = 1

      clear_alarms = 1

      msg_variables = $check.x;$profile.x;$instance.x;$job_id.x;$job_name.x;$category_name.x;$rundate.x;$elapsed_time.n

      interval = 5 min

      type = 2

      sql_timeout =  

      <thresholds>

         <default>

            <0>

               tagid = 0

               value = 5

               unit = days

               sev = critical