How to troubleshoot Deadlock errors

book

Article ID: 180941

calendar_today

Updated On:

Products

Management Platform (Formerly known as Notification Server)

Issue/Introduction

 

Resolution

Question

We're getting a lot of deadlock errors in our logs; though we don't seem to be having any problems, how can we troubleshoot these?

Answer

There are really two answers to this.

The first is that Deadlocking is most often an indication of an overloaded server, and thus you may simply have to have more hardware or look for performance tuning to relieve the stress on SQL.

The second part of this though is to track more details about the errors than the Altiris logs do and actually can capture.

Introduction to Deadlocks

A deadlock is a condition that can occur on any multi-threaded system. It does not relate specifically to SQL Server, or to any RDBMS. This situation should not be confused with blocking as they are dissimilar. Where blocking occurs where one transaction is waiting on resources in use by another transaction, deadlocking refers to the situation where two or more threads are waiting for resources in a circular blocking chain. Both threads cannot release resources until their transactions are committed or rolled back. The transactions cannot be committed or rolled back because the resources they are waiting on are owned by the other thread. Except in some exceptional circumstances deadlocks are a consequence of blocking but deadlocking can be seen as blocking in an infinite loop. Additionally, blocking will resolve itself naturally, but a deadlock will exist until action is taken to remove it. The action taken in the normal course of affairs is that SQL Servers built in deadlock detection engine will choose one of the participants as a deadlock victim, generally the participant that has done the least work. SQL Server will then cancel the victim SPIDs current batch and roll back its transactions.

Symptoms of Deadlock

A deadlock on SQL Server is normally very easy to diagnose. The built-in deadlock protection will choose a victim, and that victim will receive a 1205 error. This will normally be show as an error of the following type:

Transaction (Process ID n) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction.

The SQL 1205 Error message will appear in the Notification Server Error Logs, with a message similar to the above.  These Deadlock messages consistently appearing in the logs indicate a need for deeper investigation.

Gathering Information

In order to successfully troubleshoot the deadlock, firstly more information about the deadlock needs to be obtained.  This is done by enabling one (or more) of the trace flags that are available for the installed version of SQL Server.  In SQL Server 2005 there are two options, Trace Flag 1222 and Trace Flag 1204, whilst SQL Server 2000 only has the ability to output Trace Flag 1204.  Both of these flags are turned on by running the TRACEON DBCC command from a query window, such as:

DBCC TRACEON(1222,-1)

Note: The -1 is important, as this indicates that the Flag will be turned on globally.

The information gathered from these trace flags will be found in the SQL Server Error Log. By default the SQL Server Error logs are found in C:\Program Files\Microsoft SQL Server\MSSQL\LOG. The best way to read the logs is not via Enterprise Manager, as Enterprise Manager lists the events in reverse order. Instead, use a text editor to view the files. It may be outside of the scope of the audience of this document to troubleshoot the actual deadlock conditions, in which case simply switching the Trace Flags on, waiting for the deadlock to occur and sending the SQL Server Error logs to Altiris support will help to diagnose the issue.

Interpreting Trace Flag Information from the Error Log

The two trace flags in question output vastly different information, with Flag 1222 showing much improved data over Flag 1204. Both 1222 and 1204 can be run at the same time on a SQL Server 2005 installation, and both sets of information could be useful for troubleshooting a deadlock, but only Flag 1222 is normally required. Under SQL Server 2000, only Trace Flag 1204 is supported.

Interpreting Trace Flag 1222 Output

Information from –T1222 in the error log is split into a ‘Process-list’ and a ‘Resource-list’. Processes in the ‘Process-list’ are SPIDs or worker threads that participated in the deadlock, while Resources in the ‘Resource-list’ are resources that one of the participants owns.

There is a lot of information in the -T1222, and a key to troubleshooting the deadlock is interpreting that data. The following is an example output from the error log, with highlighted points of interest.

deadlock-list

deadlock victim=process6a92e8

process-list


process id=process6a92e8 taskpriority=0 logused=0 waitresource=KEY: 7:72057594038386688 (0500b49e5abb) waittime=2578 ownerId=2988393 transactionname=INSERT EXEC lasttranstarted=2007-02-28T09:14:01.697 XDES=0xe938b90 lockMode=S schedulerid=1 kpid=2480 status=suspended spid=58 sbid=0 ecid=0 priority=0 transcount=1 lastbatchstarted=2007-02-28T09:14:01.680 lastbatchcompleted=2007-02-28T09:14:01.680 clientapp=SQL Query Analyzer hostname=PSY-D04509 hostpid=552 loginname=sa isolationlevel=read committed (2) xactid=2988393 currentdb=7 lockTimeout=4294967295 clientoption1=673187936 clientoption2=128056
executionStack
frame procname=test.dbo.usp_proc1 line=4 stmtstart=74 sqlhandle=0x03000700ae80847c5c159800e39800000100000000000000
SELECT col2, col3 FROM t1 WHERE col2 BETWEEN @param1 AND @param1+1
frame procname=adhoc line=4 stmtstart=62 stmtend=136 sqlhandle=0x020000005a9fff1320a5bf315b282f66d32aeccc20ca9c9e
INSERT INTO #t1 EXEC usp_proc1 4
inputbuf
WHILE (1=1) BEGIN
INSERT INTO #t1 EXEC usp_proc1 4
TRUNCATE TABLE #t1
END
process id=process6a9798 taskpriority=0 logused=196 waitresource=KEY: 7:72057594038452224 (0b00274d9328) waittime=2593 ownerId=2988400 transactionname=UPDATE lasttranstarted=2007-02-28T09:14:01.713 XDES=0xe938258 lockMode=X schedulerid=1 kpid=3252 status=suspended spid=57 sbid=0 ecid=0 priority=0 transcount=2 lastbatchstarted=2007-02-28T09:13:57.730 lastbatchcompleted=2007-02-28T09:12:44.197 clientapp=SQL Query Analyzer hostname=PSY-D04509 hostpid=552 loginname=sa isolationlevel=read committed (2) xactid=2988400 currentdb=7 lockTimeout=4294967295 clientoption1=673187936 clientoption2=128056
executionStack
frame procname=test.dbo.usp_proc2 line=4 stmtstart=74 stmtend=170 sqlhandle=0x03000700e7a4787d61159800e39800000100000000000000
UPDATE t1 SET col2 = col2+1 WHERE col1 = @param1
frame procname=adhoc line=11 stmtstart=298 stmtend=336 sqlhandle=0x02000000473e261d3c9a948a0b8bf72eca19b62d2bf971b9
EXEC proc2 4
inputbuf
-- Batch #2
USE test
SET NOCOUNT ON
WHILE (1=1)
EXEC usp_proc2 4

resource-list

keylock hobtid=72057594038386688 dbid=7 objectname=test.dbo.t1 indexname=cidx id=lock66ccd00 mode=X associatedObjectId=72057594038386688
owner-list
owner id=process6a9798 mode=X
waiter-list
waiter id=process6a92e8 mode=S requestType=wait
keylock hobtid=72057594038452224 dbid=7 objectname=test.dbo.t1 indexname=idx1 id=lock66ccd80 mode=S associatedObjectId=72057594038452224
owner-list
owner id=process6a92e8 mode=S
waiter-list
waiter id=process6a9798 mode=X requestType=wait

Note:  The sections highlighted in green indicate the start of the Deadlock-list (beginning of the –T1222 output), Resource-list, or Process-list. Sections highlighted in blue indicate sections of the output that will be useful to troubleshoot the deadlock.

Once this information is gathered from the logs, it is useful to tabulate by SPID. This gives a clearer picture of what is causing the deadlock. From the parts of the output highlighted in yellow above it is shown:

SPID 58 is running this query from line 4 of stored proc [usp_proc1]
     SELECT col2, col3 FROM t1 WHERE col2 BETWEEN @param1 AND @param1+1
SPID 57 is running this query from line 4 of stored proc [usp_proc2]
     UPDATE t1 SET col2 = col2+1 WHERE col1 = @param1

SPID 58 is waiting for a Shared KEY lock on test.dbo.t1.cidx.  SPID 57 has a conflicting X lock
SPID 57 is waiting for an eXclusive KEY lock on index test.dbo.t1.idx1.  SPID 58 has a conflicting S lock

Interpreting Trace Flag 1204 Output

Information from Trace Flag 1204 is shown as a list of Nodes, with each one representing a resource participating in the deadlock. Each Node has a dedicated section detailing both the SPID that is blocked and the SPID that already has the lock on the resource that is blocking. Following this there is a final section that describes the deadlock victim. 

Once again, there is a great deal of information in the 1204 output. Translating this information into a more readable format will help in troubleshooting the deadlock. Unfortunately a few more steps need to be taken in order to do this, as resources in the output are only identified by resource IDs rather than easily readable names.

Deadlock encountered .... Printing deadlock information

Wait-for graph

Node:1
KEY: 7:72057594038452224 (0b00274d9328) CleanCnt:3 Mode:S Flags: 0x0
Grant List 0:
Owner:0x0C8FDF80 Mode: S        Flg:0x0 Ref:1 Life:00000000 SPID:57 ECID:0 XactLockInfo: 0x03EA385C
SPID: 57 ECID: 0 Statement Type: SELECT Line #: 4
Input Buf: Language Event:
WHILE (1=1) BEGIN
INSERT INTO #t1 EXEC usp_proc1 4
TRUNCATE TABLE #t1
END

Requested By:
ResType:LockOwner Stype:'OR'Xdes:0x03EA29F0 Mode: X SPID:56 BatchID:0 ECID:0 TaskProxy:(0x1AFAA374) Value:0xe6fab40 Cost:(0/196)

Node:2
KEY: 7:72057594038386688 (0500b49e5abb) CleanCnt:2 Mode:X Flags: 0x0
Grant List 0:
Owner:0x0C8FD0A0 Mode: X        Flg:0x0 Ref:1 Life:02000000 SPID:56 ECID:0 XactLockInfo: 0x03EA2A14
SPID: 56 ECID: 0 Statement Type:
UPDATE Line #: 4
Input Buf: Language Event:
USE test
SET NOCOUNT ON
WHILE (1=1)
EXEC usp_proc2 4

Requested By:
ResType:LockOwner Stype:'OR'Xdes:0x03EA3838 Mode: S SPID:57 BatchID:0 ECID:0 TaskProxy:(0x1D452374) Value:0xe6fa2e0 Cost:(0/0)

Victim Resource Owner:
ResType:LockOwner Stype:'OR'Xdes:0x03EA3838 Mode: S SPID:57 BatchID:0 ECID:0 TaskProxy:(0x1D452374) Value:0xe6fa2e0 Cost:(0/0)

From the highlighted sections the following can be shown:

SPID 57 is running a SELECT statement at line 4 of the stored proc ‘usp_proc1’.  It holds an S lock on KEY: 7:72057594038452224 (0b00274d9328).  This lock is blocking SPID 56, which is waiting to acquire an X lock on that key resource.

SPID 56 is running an UPDATE statement at line 4 of the stored proc ‘usp_proc2’.  It holds an X lock on KEY: 7:72057594038386688 (0500b49e5abb).  This lock is blocking SPID 57, which is waiting to acquire an S lock on that key resource.

SQL Server has chosen SPID 57 as the deadlock victim, it’s batch will cancelled and it’s transaction will be rolled back.

Note:  To find the exact queries that are being run at the lines given, it is necessary to manually look through the stored procedure.  The information is not given in the output.

Once the information is in a more readable format, the next step is to identify the key resources involved in the deadlock. At this stage the key resources are only being identified by resource IDs. The resource KEY: 7:72057594038452224 (0b00274d9328) refers to an object with ID 72057594038452224 in the database with the ID 7, as this example has been run on a SQL Server 2005 installation. 

Note: The format of KEY Resources IDs on SQL Server 2005 has changed from db_id:obj_id:idx_id to db_id:hobt_id. To extrapolate the easily readable names for these resources under SQL Server 2005 first we need to find the applicable object ID from the partition ID given. To do this find the partition ID in the sys.partitions table and find the name of the table and index in the sysobjects and sysindexes tables. For SQL Server 2000 the object ID and Index ID are given as part of the key, so simple look these up in sysobjects and sysindexes. To do this, find the Object Name by getting the Object ID and looking up the corresponding ID in sysobjects. Then find the Index Name by looking up the Object ID in the id column of sysindexes and the Index ID in the indid column.

After finding the well readable names from the appropriate system tables, a final statement can be made about deadlock, in this case:

SPID 56 is running an UPDATE statement at line 4 of the stored proc ‘usp_proc2’.  It holds an X lock on test.dbo.t1.cidx.  This lock is blocking SPID 57 who is waiting for an S lock on the same resource.

Warning: Both Trace Flag outputs will only show the queries that finished out the deadlock. If either of the SPIDs is running a multi statement transaction, you may need to get a profiler trace of the entire deadlock.

Checklist—What to gather for Support

A few things are helpful for Support when or if an issue might require escalation

Preparations:

  1. Turn on Trace Level logging for Altiris error logs
  2. Turn on Trace Flags applicable to the SQL Server Installation available

Once Deadlock occurs

  1. Gather Altiris error logs
  2. Gather SQL Server error logs
  3. If using the 1204 Flag, convert the Resource IDs given in logs to the easily readable names, as outlined in the note above.
  4. Send all information to support and/or attach to the ticket.

Possible Solutions

Once all the information is gathered the next step is to solve the deadlock and prevent it from happening. This should be done with Support while reviewing the specific deadlocks that are occuring.  Deadlocks are most often solved by a change to application code or stored proc (which should only be done by Support), or by a change to an index, or sometimes by a change to the "load" on a server (i.e. a server that is too busy can have an application removed, drives added for SQL, etc.). Some general hints would include the following:

Improve Efficiency of Queries

Often the simplest way to prevent deadlocking is to simple make the queries involved better. It may be possible to reduce the amount of resources the query is locking. A Query that locks fewer resources is less likely to deadlock.  Take special note of any table scans, index scans and lengthy sorts shown in the query plan. Use the SET SHOWPLAN_ALL ON command and examine the query plan.

Improve Indexes or Indexing wherever possible

Measure "Estimated Improvement" from the Database Tuning Advisor when run for the queries involved in the deadlock, especially in SQL Server 2005. Try not to place clustered indexes on columns that are frequently updated, as updates to columns with clustered indexes will require locks on the clustered index (to get to the row) and all non-clustered indexes (leaf level non clustered indexes reference rows by clustered index key value)

Transactions that run for less time deadlock less often

Try getting transactions to run for the least amount of time, within reason. Reduce the round trips between application and SQL Server. Cache read operations whenever applicable and wherever the same read is going to be used often.  By the same token, locks that last the least amount of time deadlock less often.

Access objects in the same order

Two transactions that update the same two tables in a different sequence may frequently lock, as one resource may be locked by the other transaction from earlier in its execution. This is particular difficult with the scope of the product.