sqlserver probe checkpoints - query timed out or failed to execute alarms
search cancel

sqlserver probe checkpoints - query timed out or failed to execute alarms

book

Article ID: 35010

calendar_today

Updated On:

Products

DX Unified Infrastructure Management (Nimsoft / UIM) Unified Infrastructure Management for Mainframe CA Unified Infrastructure Management SaaS (Nimsoft / UIM)

Issue/Introduction

The "sqlserver" probe generates alarms if the query times out or a profile fails to execute in the scheduled time interval.

- Code=0x80004005 Source=Microsoft OLE DB Provider for ODBC Drivers Description=[Microsoft][ODBC SQL Server Driver]Timeout expired
- query timed out! alarms


Examples:

Profile xxxxxxx, instance xxxxxxx, checkpoint 'free_space' - query timed out!

Profile <Database Server>, failed to execute in scheduled time interval, delayed by <seconds> seconds and the subkey is on  "XXDB.delay_alarm"


This means that your profile is taking more time to get executed than the described heartbeat interval.

Environment

- Any sqlserver probe
- UIM any version

Cause

- sqlserver configuration (sqlserver_monitor.cfg)

Resolution

Listed below is a complete explanation describing all of the sqlserver profile timeout fields in sqlserver profiles which can help eliminate any timeout alarms/query timeouts/failures:

1. Heartbeat - Defines the interval at which all profile checkpoint schedules will be tested and trigger eventual checkpoint execution.

This number should be a common denominator to all used check interval values.
The higher the value, the lower the profile overhead.


2. Check Interval - Default value for check interval in the profile.

This will be used if nothing else is defined in the checkpoint and overwrites the default checkpoint list setting.


3. Profile Timeout - Defines the maximum processing time for all checkpoints in the profile.

If this timeout is reached, the interval processing is finished and the probe waits for the next heartbeat to evaluate any checkpoint schedules. The alarm message is issued.


4. SQL Timeout - Every checkpoint query runs asynchronously.

In case the query reaches the SQL timeout, the checkpoint processing will be terminated and the next checkpoint will be started. An alarm is issued.


5. Delay Threshold -

Timeout threshold for the profile delay alarm.

Example:

If you are getting alarms for
"Profile <Database Server>, failed to execute in the scheduled time interval, delayed by <seconds> seconds", it means that your profile is taking more time to get executed then the described heartbeat interval.

For example, if the profile is configured as follows:

1. Heartbeat - 60 Seconds
2. Checkinterval - 2 Mins - (Checkpoint execution)
3. Profile Timeout - 10 mins
4. delay_threshold = 15 Sec


You are getting the alarm because the profile would be getting executed say every 2 mins (1 min more than scheduled interval i.e Heartbeat), which is within the profile timeout limit, but more than "delay_threshold" limit.

- Heartbeat is set to start the profile execution
- Check interval is set for checkpoint execution
- Profile timeout is the value within which the profile execution should be completed else a new run for the profile would start
- Delay threshold is the value within which the next scheduled run of the profile should start, else an alarm will be generated


***The query timed out! alarm comes when the SQL Timeout in the profile is less than how long the query takes so please increase this value to some appropriate value.***

If you have a lot of QOS data/rows in the database tables, then a checkpoint may take more time which can lead to high profile execution time and hence result in timeout alarms.

You can configure the remaining timeouts based on the above explanation and the alarm you are getting.