CA UIM sqlserver probe is not sending alarms for the check_dbalive checkpoint.

book

Article ID: 36501

calendar_today

Updated On:

Products

NIMSOFT PROBES DX Infrastructure Management

Issue/Introduction

Problem:

The sqlserver probe is installed on a robot to remotely monitor MS SQL Servers. When something happens resulting in the database being unavailable such as a hardware crash, the probe fails to send the alarm for the check_dbalive checkpoint. 

Instead he customer gets many messages like this:

2016-01-04 07:58:05OPENProfile <Profile Name>, failed to execute in scheduled time interval, delayed by 308 seconds1SQL-Serverminor

'failed to execute in scheduled time interval' indicates the probe failed to complete all the checkpoints within the configured timeouts. 

 

Cause

The sqlserver probe has several configurable timeouts which limit the time to process all checkpoints. When the timeout is reached the probe will stop processing the checkpoints and generate the above message. check_dbalive being one of the checkpoints could be excluded due to the timeouts. 

Environment

Potentially this will effect all revisions of the probe and SQL Server.

 

Resolution

The timeouts can be increased, or since checkpoints are processed in order, the check_dbalive checkpoint can be moved to the top so it is processed first.

Edit C:\Program Files (x86)\Nimsoft\probes\database\sqlserver\sqlserver_monitor.cfg by moving the section for <check_dbalive> to the top of the <checkpoints> section like this:

<groups>

   <UMP>

      description = To fill default UMP dashboards

      <checkpoints>

         <check_dbalive>

            active = yes

            description = Monitors connectivity to the database instance

            qos = yes

            qos_list = yes

            clear_msg = check_dbalive_1

            clear_sev = clear

            interval = 5 min

            sql_timeout =

            scheduling = rules

            use_exclude = no

            use_include = no

            samples = 1

            <thresholds>

               <default>

                  <0>

                     tagid = 0

                     value = 1

                     unit =

                     sev = major

                     msg = check_dbalive_2

                     condition = !=

                     clear_msg = check_dbalive_1

                     scheduling =

                     key_col_name =

                     key_col_value = default

                  </0>

               </default>

            </thresholds>

            <qos_lists>

               <0>

                  qos_name = check_dbalive

                  qos_desc = SQL Server Availability

                  qos_unit = Availability

                  qos_abbr = Avail.

                  qos_max = 1

                  qos_value = status

                  qos_key =

               </0>

            </qos_lists> 
         </check_dbalive>   
         <active_connection_ratio> 
            active = no