There is no specific "out of the box" checkpoint(s) to monitor what might be considered a 'hung' database/database instance.
The term "hung state" when referring to a database instance is unfortunately too vague a statement to provide focused-specific help on. This could mean many different things, all with different possible solutions. For example,
- a single instance or application is not responding
- you can not log in remotely with RDP but all other functions are working
- All web services are down but other local services are working
- an instance appears to be available, but no transactions can be run
You first need to define exactly what constitutes a 'hung state' and then see what can be done to monitor for that condition, and alarm on it.
Monitoring options:
If it is an application 'not responding,' you could possibly monitor it with the e2e_appmon probe and create an e2e transaction to test access to the instance to run a basic statement/transaction that indicates the instance is hung but you would have to identify a means of doing so that you would trust as an indicator/proof.
If the application should be adding data to a database you could possibly create an SQL checkpoint to check for records added and alarm if there are not new ones.
If the application writes to a log file setup logmon to parse for an error/errors, e.g., transaction logs, or a remote dirscan probe to monitor the log files for activity.
Overall, I would recommend considering the use of:
- net_connect probe to monitor instance availability / service
[email protected] - logmon to parse the transaction logs for specific errors
- logmon OR nexec to run a given command, e.g., telnet <database_server> port to test connectivity or run another useful command to test the access/run a stored procedure/transaction
- sqlserver/oracle custom checkpoint to run a statement or a stored procedure suggested by your DBA that can help determine the true state of the instance
- processes probe to test that the 'agent' is up and running
- e2e to run a synthetic scripted transaction against the database every 15 minutes to test if the user can log on and run a given low-impact transaction
- cdm probe to monitor CPU/Memory/Disk I/O on the server IF one or more of those symptoms are indicators for the hung state
For delivering a solution to your customer/client, we recommend working with their DBA to help identify the method that best fits the situation/problem.