How to monitor a hung database instance in MS SQL Server / Oracle
search cancel

How to monitor a hung database instance in MS SQL Server / Oracle

book

Article ID: 130195

calendar_today

Updated On:

Products

DX Unified Infrastructure Management (Nimsoft / UIM) CA Unified Infrastructure Management On-Premise (Nimsoft / UIM) CA Unified Infrastructure Management SaaS (Nimsoft / UIM)

Issue/Introduction

Customers may need to monitor their databases/database instances when they become 'hung.'

Environment

  • UIM v8.5x or higher

Resolution

There is no specific "out of the box" checkpoint(s) to monitor what might be considered a 'hung' database/database instance or process or server for that matter.

The term "hung state" when referring to a database instance is unfortunately too vague a statement to provide focused-specific help on. This could mean many different things, all with different possible solutions. For example,

- a single instance or application is not responding 
- you can not log in remotely with RDP but all other functions are working
- All web services are down but other local services are working
- an instance appears to be available, but no transactions can be run

You first need to define exactly what constitutes a 'hung state' and then see what can be done to monitor for that condition, and alarm on it.

Monitoring options

If it is an application 'not responding,' you could possibly monitor it with the e2e_appmon probe and create an e2e transaction to test access to the instance to run a basic statement/transaction that indicates the instance is hung but you would have to identify a means of doing so that you would trust as an indicator/proof.

If the application should be adding data to a database you could possibly create an SQL checkpoint to check for records added and alarm if there are no new records added. 

If the application writes to a log file setup logmon to parse for an error/errors, e.g., transaction logs, or a remote dirscan probe to monitor the log files for activity.

Overall, we would recommend considering the use of: 

- net_connect probe to monitor instance availability / service@port 
- logmon to parse the transaction logs for specific errors 
- logmon OR nexec to run a given command, e.g., telnet <database_server> port to test connectivity or run another useful command to test the access/run a stored procedure/transaction 
- sqlserver/oracle custom checkpoint to run a statement or a stored procedure suggested by your DBA that can help determine the true state of the instance 
- processes probe to test that the 'agent' is up and running 
- e2e to run a synthetic scripted transaction against the database every 15 minutes to test if the user can log on and run a given low-impact transaction 
- cdm probe to monitor CPU/Memory/Disk I/O on the server IF one or more of those symptoms end up being 'indicators' of a hung state 

For delivering a solution to your customer/client, we recommend working with their DBA to help identify the method that best fits the situation/problem.

Additional Information

Here are some helpful links:

Oracle:
http://www.dba-oracle.com/t_troubleshooting_hung_database.htm

https://docs.oracle.com/cd/E17781_01/server.112/e18804/monitoring.htm#ADMQS247

MS SQL Server:
https://troubleshootingsql.com/2012/07/04/sql-server-is-hung/

http://giladka8.blogspot.com/2012/08/sql-server-query-is-stuck.html

https://docs.microsoft.com/en-us/sql/database-engine/configure-windows/troubleshoot-connecting-to-the-sql-server-database-engine?view=sql-server-2017