We're getting a lot of deadlock errors in our logs; though we don't seem to be having any problems, how can we troubleshoot these?
Answer
There are really two answers to this.
The first is that Deadlocking is most often an indication of an overloaded server, and thus you may simply have to have more hardware or look for performance tuning to relieve the stress on SQL. The second part of this though is to track more details about the errors than the Altiris logs do and actually can capture.
A deadlock is a condition that can occur on any multi-threaded system. It does not relate specifically to SQL Server, or to any RDBMS. This situation should not be confused with blocking as they are dissimilar. Where blocking occurs where one transaction is waiting on resources in use by another transaction, deadlocking refers to the situation where two or more threads are waiting for resources in a circular blocking chain. Both threads cannot release resources until their transactions are committed or rolled back. The transactions cannot be committed or rolled back because the resources they are waiting on are owned by the other thread. Except in some exceptional circumstances deadlocks are a consequence of blocking but deadlocking can be seen as blocking in an infinite loop. Additionally, blocking will resolve itself naturally, but a deadlock will exist until action is taken to remove it. The action taken in the normal course of affairs is that SQL Servers built in deadlock detection engine will choose one of the participants as a deadlock victim, generally the participant that has done the least work. SQL Server will then cancel the victim SPIDs current batch and roll back its transactions.
Transaction (Process ID n) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction.
The SQL 1205 Error message will appear in the Notification Server Error Logs, with a message similar to the above. These Deadlock messages consistently appearing in the logs indicate a need for deeper investigation.
DBCC TRACEON(1222,-1)
Note: The -1 is important, as this indicates that the Flag will be turned on globally.
The information gathered from these trace flags will be found in the SQL Server Error Log. By default the SQL Server Error logs are found in C:\Program Files\Microsoft SQL Server\MSSQL\LOG. The best way to read the logs is not via Enterprise Manager, as Enterprise Manager lists the events in reverse order. Instead, use a text editor to view the files. It may be outside of the scope of the audience of this document to troubleshoot the actual deadlock conditions, in which case simply switching the Trace Flags on, waiting for the deadlock to occur and sending the SQL Server Error logs to Altiris support will help to diagnose the issue.
Information from –T1222 in the error log is split into a ‘Process-list’ and a ‘Resource-list’. Processes in the ‘Process-list’ are SPIDs or worker threads that participated in the deadlock, while Resources in the ‘Resource-list’ are resources that one of the participants owns.
There is a lot of information in the -T1222, and a key to troubleshooting the deadlock is interpreting that data. The following is an example output from the error log, with highlighted points of interest.
deadlock-list
deadlock victim=process6a92e8
process-list
process id=process6a92e8 taskpriority=0 logused=0 waitresource=KEY: 7:72057594038386688 (0500b49e5abb) waittime=2578 ownerId=2988393 transactionname=INSERT EXEC lasttranstarted=2007-02-28T09:14:01.697 XDES=0xe938b90 lockMode=S schedulerid=1 kpid=2480 status=suspended spid=58 sbid=0 ecid=0 priority=0 transcount=1 lastbatchstarted=2007-02-28T09:14:01.680 lastbatchcompleted=2007-02-28T09:14:01.680 clientapp=SQL Query Analyzer hostname=PSY-D04509 hostpid=552 loginname=sa isolationlevel=read committed (2) xactid=2988393 currentdb=7 lockTimeout=4294967295 clientoption1=673187936 clientoption2=128056
executionStack
frame procname=test.dbo.usp_proc1 line=4 stmtstart=74 sqlhandle=0x03000700ae80847c5c159800e39800000100000000000000
SELECT col2, col3 FROM t1 WHERE col2 BETWEEN @param1 AND @param1+1
frame procname=adhoc line=4 stmtstart=62 stmtend=136 sqlhandle=0x020000005a9fff1320a5bf315b282f66d32aeccc20ca9c9e
INSERT INTO #t1 EXEC usp_proc1 4
inputbuf
WHILE (1=1) BEGIN
INSERT INTO #t1 EXEC usp_proc1 4
TRUNCATE TABLE #t1
END
process id=process6a9798 taskpriority=0 logused=196 waitresource=KEY: 7:72057594038452224 (0b00274d9328) waittime=2593 ownerId=2988400 transactionname=UPDATE lasttranstarted=2007-02-28T09:14:01.713 XDES=0xe938258 lockMode=X schedulerid=1 kpid=3252 status=suspended spid=57 sbid=0 ecid=0 priority=0 transcount=2 lastbatchstarted=2007-02-28T09:13:57.730 lastbatchcompleted=2007-02-28T09:12:44.197 clientapp=SQL Query Analyzer hostname=PSY-D04509 hostpid=552 loginname=sa isolationlevel=read committed (2) xactid=2988400 currentdb=7 lockTimeout=4294967295 clientoption1=673187936 clientoption2=128056
executionStack
frame procname=test.dbo.usp_proc2 line=4 stmtstart=74 stmtend=170 sqlhandle=0x03000700e7a4787d61159800e39800000100000000000000
UPDATE t1 SET col2 = col2+1 WHERE col1 = @param1
frame procname=adhoc line=11 stmtstart=298 stmtend=336 sqlhandle=0x02000000473e261d3c9a948a0b8bf72eca19b62d2bf971b9
EXEC proc2 4
inputbuf
-- Batch #2
USE test
SET NOCOUNT ON
WHILE (1=1)
EXEC usp_proc2 4resource-list
keylock hobtid=72057594038386688 dbid=7 objectname=test.dbo.t1 indexname=cidx id=lock66ccd00 mode=X associatedObjectId=72057594038386688
owner-list
owner id=process6a9798 mode=X
waiter-list
waiter id=process6a92e8 mode=S requestType=wait
keylock hobtid=72057594038452224 dbid=7 objectname=test.dbo.t1 indexname=idx1 id=lock66ccd80 mode=S associatedObjectId=72057594038452224
owner-list
owner id=process6a92e8 mode=S
waiter-list
waiter id=process6a9798 mode=X requestType=wait
Note: The sections highlighted in green indicate the start of the Deadlock-list (beginning of the –T1222 output), Resource-list, or Process-list. Sections highlighted in blue indicate sections of the output that will be useful to troubleshoot the deadlock.
Once this information is gathered from the logs, it is useful to tabulate by SPID. This gives a clearer picture of what is causing the deadlock. From the parts of the output highlighted in yellow above it is shown:
SPID 58 is running this query from line 4 of stored proc [usp_proc1]
SELECT col2, col3 FROM t1 WHERE col2 BETWEEN @param1 AND @param1+1
SPID 57 is running this query from line 4 of stored proc [usp_proc2]
UPDATE t1 SET col2 = col2+1 WHERE col1 = @param1SPID 58 is waiting for a Shared KEY lock on test.dbo.t1.cidx. SPID 57 has a conflicting X lock
SPID 57 is waiting for an eXclusive KEY lock on index test.dbo.t1.idx1. SPID 58 has a conflicting S lock
Information from Trace Flag 1204 is shown as a list of Nodes, with each one representing a resource participating in the deadlock. Each Node has a dedicated section detailing both the SPID that is blocked and the SPID that already has the lock on the resource that is blocking. Following this there is a final section that describes the deadlock victim.
Once again, there is a great deal of information in the 1204 output. Translating this information into a more readable format will help in troubleshooting the deadlock. Unfortunately a few more steps need to be taken in order to do this, as resources in the output are only identified by resource IDs rather than easily readable names.
Deadlock encountered .... Printing deadlock information
Wait-for graph
Node:1
KEY: 7:72057594038452224 (0b00274d9328) CleanCnt:3 Mode:S Flags: 0x0
Grant List 0:
Owner:0x0C8FDF80 Mode: S Flg:0x0 Ref:1 Life:00000000 SPID:57 ECID:0 XactLockInfo: 0x03EA385C
SPID: 57 ECID: 0 Statement Type: SELECT Line #: 4
Input Buf: Language Event:
WHILE (1=1) BEGIN
INSERT INTO #t1 EXEC usp_proc1 4
TRUNCATE TABLE #t1
END
Requested By:
ResType:LockOwner Stype:'OR'Xdes:0x03EA29F0 Mode: X SPID:56 BatchID:0 ECID:0 TaskProxy:(0x1AFAA374) Value:0xe6fab40 Cost:(0/196)Node:2
KEY: 7:72057594038386688 (0500b49e5abb) CleanCnt:2 Mode:X Flags: 0x0
Grant List 0:
Owner:0x0C8FD0A0 Mode: X Flg:0x0 Ref:1 Life:02000000 SPID:56 ECID:0 XactLockInfo: 0x03EA2A14
SPID: 56 ECID: 0 Statement Type: UPDATE Line #: 4
Input Buf: Language Event:
USE test
SET NOCOUNT ON
WHILE (1=1)
EXEC usp_proc2 4
Requested By:
ResType:LockOwner Stype:'OR'Xdes:0x03EA3838 Mode: S SPID:57 BatchID:0 ECID:0 TaskProxy:(0x1D452374) Value:0xe6fa2e0 Cost:(0/0)Victim Resource Owner:
ResType:LockOwner Stype:'OR'Xdes:0x03EA3838 Mode: S SPID:57 BatchID:0 ECID:0 TaskProxy:(0x1D452374) Value:0xe6fa2e0 Cost:(0/0)
From the highlighted sections the following can be shown:
SPID 57 is running a SELECT statement at line 4 of the stored proc ‘usp_proc1’. It holds an S lock on KEY: 7:72057594038452224 (0b00274d9328). This lock is blocking SPID 56, which is waiting to acquire an X lock on that key resource.
SPID 56 is running an UPDATE statement at line 4 of the stored proc ‘usp_proc2’. It holds an X lock on KEY: 7:72057594038386688 (0500b49e5abb). This lock is blocking SPID 57, which is waiting to acquire an S lock on that key resource.
SQL Server has chosen SPID 57 as the deadlock victim, it’s batch will cancelled and it’s transaction will be rolled back.
Note: To find the exact queries that are being run at the lines given, it is necessary to manually look through the stored procedure. The information is not given in the output.
Once the information is in a more readable format, the next step is to identify the key resources involved in the deadlock. At this stage the key resources are only being identified by resource IDs. The resource KEY: 7:72057594038452224 (0b00274d9328) refers to an object with ID 72057594038452224 in the database with the ID 7, as this example has been run on a SQL Server 2005 installation.
Note: The format of KEY Resources IDs on SQL Server 2005 has changed from db_id:obj_id:idx_id to db_id:hobt_id. To extrapolate the easily readable names for these resources under SQL Server 2005 first we need to find the applicable object ID from the partition ID given. To do this find the partition ID in the sys.partitions table and find the name of the table and index in the sysobjects and sysindexes tables. For SQL Server 2000 the object ID and Index ID are given as part of the key, so simple look these up in sysobjects and sysindexes. To do this, find the Object Name by getting the Object ID and looking up the corresponding ID in sysobjects. Then find the Index Name by looking up the Object ID in the id column of sysindexes and the Index ID in the indid column.
After finding the well readable names from the appropriate system tables, a final statement can be made about deadlock, in this case:
SPID 56 is running an UPDATE statement at line 4 of the stored proc ‘usp_proc2’. It holds an X lock on test.dbo.t1.cidx. This lock is blocking SPID 57 who is waiting for an S lock on the same resource.
Warning: Both Trace Flag outputs will only show the queries that finished out the deadlock. If either of the SPIDs is running a multi statement transaction, you may need to get a profiler trace of the entire deadlock.
Often the simplest way to prevent deadlocking is to simple make the queries involved better. It may be possible to reduce the amount of resources the query is locking. A Query that locks fewer resources is less likely to deadlock. Take special note of any table scans, index scans and lengthy sorts shown in the query plan. Use the SET SHOWPLAN_ALL ON command and examine the query plan.
Measure "Estimated Improvement" from the Database Tuning Advisor when run for the queries involved in the deadlock, especially in SQL Server 2005. Try not to place clustered indexes on columns that are frequently updated, as updates to columns with clustered indexes will require locks on the clustered index (to get to the row) and all non-clustered indexes (leaf level non clustered indexes reference rows by clustered index key value)
Try getting transactions to run for the least amount of time, within reason. Reduce the round trips between application and SQL Server. Cache read operations whenever applicable and wherever the same read is going to be used often. By the same token, locks that last the least amount of time deadlock less often.
Two transactions that update the same two tables in a different sequence may frequently lock, as one resource may be locked by the other transaction from earlier in its execution. This is particular difficult with the scope of the product.