sqlserver probe agent_job_failure not creating an alert in USM/Operator Console under any severity setting

book

Article ID: 103959

calendar_today

Updated On:

Products

NIMSOFT PROBES DX Infrastructure Management CA Unified Infrastructure Management for z Systems CA Unified Infrastructure Management On-Premise (Nimsoft / UIM) CA Unified Infrastructure Management SaaS (Nimsoft / UIM)

Issue/Introduction

The sqlserver probe will not generate an alarm for agent_job_failure checkpoint in USM or Operator Console under any severity setting.

Cause

- configuration

Environment

- UIM 8.51 or higher
- sqlserver (any version)

Resolution

Customer decided to use an alternative to alarm on sqlserver agent job failures.

Since SQL job failures write an event ID 208 to the application file, the customer chose to setup the ntevl probe for the applicable devices with the specific criteria/messages for each job they needed to alert on. This is an excellent and very straightforward alternative and you can use the ntevl probe profiles to alarm on individual job failures and their associated alarm messages,.

Event ID: 208 Source: SQLSERVERAGENT

Source
 
Level
 
Description (example)
 
SQL Server Scheduled Job 'DB Backup Job for DB Maintenance Plan '<name of job>' (0xF673402DE6B2F14A9C671B08202C92E2) - Status: Failed - Invoked on: 7/10/2001 12:15:00 AM - Message: The job failed. Unable to determine if the owner (<username>) of job DB Backup Job for DB <name of job>' has server access (reason: Could not obtain information about Windows NT group/user '<name of user>'. [SQLSTATE 42000] (Error 8198)).

Additional Information

In some cases, the ntevl Application log may display the expected Windows Event (e.g., job failure) under the Status Tab, yet not generate an alarm in UIM.

Check to see if there is any evidence in the nas tables of such an alarm being generated.

   SELECT * from NAS_ALARMS WHERE message LIKE '%<job_name>%'

   SELECT * from NAS_TRANSACTION_SUMMARY WHERE message LIKE '%<job_name>%'

   SELECT * from NAS_TRANSACTION_LOG WHERE message LIKE '%<job_name>%'

In this one case, the job in question was being excluded as per the ntevl 'Exclude' Tab so therefore there was no alarm generated in UIM nor Spectrum via sync.

 

How does the sqlserver "agent_job_failure" checkpoint work?

https://knowledge.broadcom.com/external/article?articleId=34961

Customers may want to double check with the team as to who may have excluded the job and for what reason, and then decide from there whether or not to change it so that the job is no longer excluded.