search cancel

Dynamic monitoring (baselining & std deviation) not working, baseline data is missing and see frequent, inaccurate alarms

book

Article ID: 256111

calendar_today

Updated On:

Products

DX Unified Infrastructure Management (Nimsoft / UIM)

Issue/Introduction

Had two outages this past weekend that cost millions. If our dynamic monitoring would have been functioning properly, it may have helped catch or prevent those outages. We are seeing lots of spikes, false alarms even with high standard deviations set. Need to check if anything with the dynamic monitoring is not working correctly.

Environment

  • Release: 20.3, 20.4

Cause

- QOS dynamic threshold and baseline configuration in AC

Resolution

UIM 20.3
-------------

For UIM 20.3 customers, to achieve this we would have to leverage the line chart widget in the dashboard designer connected with the SQL data source pointing to BN_QOS_DATA_* tables.

See attached Word doc-> Viewing baseline data via the Dashboard Designer in UIM 20.3


UIM 20.4
---------------

The capability to display the baseline trend is available 'out-of-the-box' with the Performance Reports Designer (PRD) which was reintroduced in 20.4.

For 20.3 customers, we have to leverage the line chart widget in the dashboard designer connected with the SQL data source pointing to BN_QOS_DATA_* tables. 

In the meantime, I've suggested that the PRD add a new-additional column that displays not only the min/max/avg/deviation but also the baseline and product mgmt has taken that under consideration for upcoming releases.

Upgrading and using UIM 20.4 PRD, allows you to select and view the baseline in the reports/charts.

The baseline values stored in BN_QOS_DATA_* tables are the cumulative calculated values from the baseline_engine, not the sample values.

When you enable baseline computation for one of the QOS objects/monitors e.g., (QOS_AZURE_VM_CPU_USAGE) you can see the baseline values are aggregated based on the qos samples collected in a given hour in a day across the four weeks duration (default).

This makes it easier to see the baseline and use it to set the threshold.

The PRD 'USM view' option is very useful and it shows min, max, avg, and std deviation. 

When you hover on the chart line you can see the baseline value but the legend doesn't include a column for the baseline which would be useful to show the latest baseline value in the view for the given time and aggregation interval. A request has been entered for product management to work with DEV to add another column, e.g., for Baseline.

For dashboards and analysis, knowing the base query to fetch the current baseline based on a time period and interval would definitely be very valuable for reports/dashboards and comparison. I've suggested this to engineering.

The baseline values stored in BN_QOS_DATA_* tables are the cumulative calculated values from the baseline_engine, not the sample values.

When you enable baseline computation for one of the QOS objects/monitors e.g., (QOS_AZURE_VM_CPU_USAGE) you can see the the baseline values are aggregated based on the qos samples collected in a given hour in a day across the four weeks duration (default).

This makes it easier to see the baseline so you can use it to set the threshold.

Additional Information

--Script for generating the baseline data for a given Source, Target, and QOS. Please let me know if you need any further information.

--In this script, the BN_ table will be auto-identified based on the qos_def_id.

DECLARE
@qos_def_id NVARCHAR(30),
@sql_statement NVARCHAR(MAX),
@table_id NVARCHAR(30),
@source NVARCHAR(100),
@target NVARCHAR(100),
@qos_name NVARCHAR(50),
@baseline_start_duration NVARCHAR(10),
@baseline_forecast_duration NVARCHAR(10),
@baseline_table NVARCHAR(30)
BEGIN
SET @source = 'xxxxx-primary' -- Source name REF: S_QOS_DATA/PRD
SET @target = 'xxxxx' -- Target Name Ex: Idle, Total for CPU Usage, REF: S_QOS_DATA/PRD
SET @qos_name = 'QOS_CPU_USAGE' -- QOS Name
SET @baseline_start_duration = '30' -- Start duration. Ex: 30 results in current date minus 30 days. 
SET @baseline_forecast_duration = '1' -- End Duration. E.g., 1 results in current date plus 1 day. Keep it to zero for avoiding the forecast baseline records.

-- Get qos_def_id and  table_id based on source, target and qos
SELECT top 1 @qos_def_id=qos_def_id 
    FROM s_qos_data qd (nolock)
    WHERE qd.QOS = @qos_name;
SELECT top 1 @table_id=table_id
    FROM s_qos_data SQD (nolock)
    WHERE SQD.QOS = @qos_name and SQD.source = @source and SQD.target = @target

-- Form the baseline table dynamically
if @qos_def_id < 10
SET @baseline_table = 'BN_QOS_DATA_000' + @qos_def_id;
else if @qos_def_id < 100
SET @baseline_table = 'BN_QOS_DATA_00' + @qos_def_id;
else if @qos_def_id < 1000
SET @baseline_table = 'BN_QOS_DATA_0' + @qos_def_id;
else
SET @baseline_table = 'BN_QOS_DATA_' + @qos_def_id;

-- print @baseline_table;
SET @sql_statement = 'select starttime, stoptime, time_interval, samplevalue from ' + @baseline_table +
' (nolock) where table_id = ' + @table_id +
' and stoptime between GETDATE()-' + @baseline_start_duration + ' and GETDATE()+' + @baseline_forecast_duration + ' order by stoptime desc';

-- PRINT @sql_statement;

EXEC(@sql_statement);
END;

Sample Results:

Attachments

Viewing baseline data via the Dashboard Designer in UIM 20.3docx_1670947306303.docx get_app