Need to understand how the deadlocks/sec is actually calculated in sqlserver probe.
Per the documentation:
"Monitors the number of deadlocks per second in an interval."
Does this mean the value is influenced by the polling interval itself?
DX UIM - Any Version
sqlserver probe - 5.61
The sqlserver probe runs the following query for this checkpoint:
select rtrim(instance_name) object ,cntr_value count, datediff(ss, '1970-01-01 00:00:00', getdate()) time from master.sys.dm_os_performance_counters
WHERE LTRIM(RTRIM(object_name)) LIKE '%:Locks' AND LTRIM(RTRIM(counter_name)) = 'Number of Deadlocks/sec'
This is actually a little bit misleading, because this counter is not returning a Deadlocks/sec value, instead it actually returns the total number of deadlocks since the server startup, along with a timestamp based on the query time in epoch time.
We can see this by restarting the server, and then creating a known number of deadlocks. For example, after restarting the SQL Server and intentionally creating 9 deadlocks in a row, and the output of the query looks like this:
On the first interval this is queried and the timestamp is stored along with the value. No alarm or QoS is sent on the first interval for this metric.
On the next interval the same query is executed and the difference between timestamps is calculated and divided by the value for deadlocks, and this is used to determine the number of deadlocks per second within the given interval.
So in fact the interval does influence the threshold/value given.
For example - in the above scenario, if all 9 deadlocks were created within a 5-minute polling interval, the probe would return the value "0.03" for deadlocks/sec during that interval:
The value for this metric is given as the number of deadlocks which occurred between polling intervals, divided by the polling interval in seconds, to generate the deadlocks per second.
The smallest value the probe appears capable of recognizing is 0.01 so with a 300 second interval: 300 * .01 = 3 ; we would need 3 deadlocks in the interval to trigger the alert with a threshold value of 0.01.
If you need to detect a single deadlock in the interval, an interval of 60 seconds with a threshold of >= 0.01/s should suffice.