When trying to finish the process of configuring baseline collection we found that ALL of the baseline tables are empty even though we checked the particular tables that should contain the baseline data.
Prior to the DB migration, as per the customer, baselining and thresholds alarming was working as expected even when the standard deviation option was being used as well.
- customer migrated from an earlier version of MS SQL Server up to v2017
- Noted in the UIM environment, that server names/hub names were changed as well
- DB migration occurred ~3 years ago (Dec-Jan. 2021).
- Customer/we are not sure just HOW the migration was performed/took place.
- The issue with baselining started after the DB migration, as reported by customers.
qos_processor errors:
Jan 09 11:41:48:597 [attach_clientsession, qos_processor] method = public void com.nimsoft.slm.qos.processor.AbstractReceiver.onBulkQueueMessage(com.nimsoft.nimbus.NimSession,com.nimsoft.nimbus.PDS) throws com.nimsoft.nimbus.NimException
and
[Qos Baseline Processor #1, qos_processor] Load single QoS query: select d.table_id, d.checksum, d.qos_def_id, d.table_id, d.ci_metric_id, d.qos, d.source, d.target, d.origin, d.host, d.robot, d.probe, d.nim_origin, d.modifier, d.samplerate from s_qos_data d where d.checksum = '7C223C9D59BE5D0A9526219BC7A4F09FAEBBB247'
[Qos Baseline Processor #1, qos_processor] No Qos Loaded
[Qos Baseline Processor #1, qos_processor] Loaded null
[Qos Baseline Processor #1, qos_processor] Could not find qos: S_QOS_DATA { qos_name: QOS_NETWORK_AGGREGATED_TRAFFIC, source: <xxxx.example.om>, target: <xxx.xxx.xxxx.example.com>, nim_origin: null, origin: <hubname>, modifier: nimsoft, host: ##.###.##.##, robot: <robot_hostname>, probe: baseline_engine, table_id: null, qos_def_id: null }
[Qos Baseline Processor #1, qos_processor] Not updating db: S_QOS_DATA { qos_name: QOS_NETWORK_AGGREGATED_TRAFFIC, source: <xxxx.example.om>, target: <xxx.xxx.xxxx.example.com>, nim_origin: null, origin: <hubname>, modifier: nimsoft, host: ##.###.##.##, robot: <robot_hostname>, probe: baseline_engine, table_id: null, qos_def_id: null }
[Qos Baseline Processor #1, qos_processor] Cache miss: true
- corrupt/invalid checksums post database migration from an earlier version of MS SQL Server
The issue of empty baseline tables was resolved by updating the qos_processor.
The qos_processor probe is responsible for:
Downloaded and deployed qos_processor 20.43T1 test build with additional logging and also slightly tweaked the checksum validation (driven by a configuration key). A new configuration key was added in the qos_processor.cfg as shown below :
use_ci_metric_id_to_load_qos = false
Note that the new enhanced version of qos_processor 20.43T1, is attached to this KB Article.
In the qos_processor probe, we use a checksum to load the qos whenever a new baseline qos is generated. With this new parameter, if the checksum is not available, then we use ci_metric_id to load the qos from the s_qos_data table. This is the default behavior of the probe, but if we set use_ci_metric_id_to_load_qos = true, then qos_processor probe will be using the ci_metric_id first to load the QoS and if ci_metric_id is null then it use the checksum to load the qos from s_qos_data table.
use_ci_metric_id_to_load_qos = true
In the qos_processor.log, if baseline values are being inserted into the database, you will see messages similar to these listed below:
Jan 12 09:03:43:860 [Qos Baseline Processor #1, qos_processor] Use ci_metric_id to load the qos. If it is null then use checksum to load the qos.
Jan 12 09:03:43:860 [Qos Baseline Processor #1, qos_processor] Load single QoS query: select d.table_id, d.checksum, d.qos_def_id, d.ci_metric_id, d.qos, d.source, d.target, d.origin, d.host, d.robot, d.probe, d.nim_origin, d.modifier, d.samplerate from s_qos_data d where d.ci_metric_id = 'MC13B2xxxxxxxx4A64611148'
Jan 12 09:03:43:861 [Qos Baseline Processor #1, qos_processor] Loaded S_QOS_DATA { qos_name: QOS_MEMORY_PHYSICAL, source: xxxxxx-xxx-xx, target: xxxxxxxx, nim_origin: <hubname>, origin: xxxxxx-xxxxxx, modifier: nimsoft, host: ##.###.###.##, robot: xxxxxx-xxxxxx, probe: cdm, table_id: 22, qos_def_id: 21 }
Jan 12 09:03:43:861 [Qos Baseline Processor #1, qos_processor] Putting into cache: S_QOS_DATA { qos_name: QOS_MEMORY_PHYSICAL, source: xxxxxx-xxxx, target: xxxxxx-xxxx, nim_origin: <hubname>, origin: <hubname>, modifier: nimsoft, host: ##.###.###.##, robot: <robot_hostname>, probe: cdm, table_id: 22, qos_def_id: 21 }
Jan 12 09:03:43:863 [Qos Baseline Db Updater #1, qos_processor] QosBaselineUpdater updating 1 baselines.
Jan 12 09:03:43:863 [Qos Baseline Db Updater #1, qos_processor] insert into BN_QOS_DATA_0021 (table_id, starttime, stoptime, time_interval, samplevalue) values (?, ?, ?, ?, ?)
Jan 12 09:03:43:863 [Qos Baseline Db Updater #1, qos_processor] Total time batching before execution: 0
Once use_ci_metric_id_to_load_qos was set to true and a few hours passed, the baseline tables (BN_QOS_DATA_*) were populated.
This of course implies that Dynamic monitoring and thresholding has been configured. For example:
As of UIM 20.4 the Performance Reports Designer (PRD) supports enabling the baseline data in a chart/report. Here is an example below showing the developed baseline: