UIM - data_engine best practices

book

Article ID: 33592

calendar_today

Updated On:

Products

NIMSOFT PROBES DX Infrastructure Management

Issue/Introduction

The data_engine is responsible for inserting QOS and discovery data into your SQL database. It is also responsible for doing nightly maintenance on your database to re-index all of the tables and move current QOS to history and drop the history based on your retention settings.

Below are some tips to make the data_engine and database work more efficiently.

 

Environment

CA UIM 8.x or higher
 

Resolution

1) Auto-reindexing (optional)

There is a check box to automatically re-index tables. This is checked by default.

If it is not checked then your DBA should be checking and re-indexing the database on a regular basis. The default for UIM is every day at 00:40. Note that in large environments, partitioning should be enabled and then reindexing may be disabled to avoid long running jobs that may fail or cause significant resource consumption on the database server. A daily job can be set to reindex the most important tables. See additional info below.


2) Speed up deletion of records

The data_engine's nightly maintenance removes old data. If you are running MS SQL Server Enterprise Edition, you can turn on Partition data tables. This turns the delete into an I/O Operation verses an SQL delete query with a WHERE clause. The SQL server when deleting partitions drops the whole partition in a single I/O operation to remove the history.

3) Tuning of the data_engine to handle more messages

The data engine pulls data from the hub static queue called 'data_engine' by default. By default this pulls at a bulk rate of 100 messages. This can be seen on the Hub GUI status screen.

 

This can be increased up to 2000 when using hub 7.X or higher.


This is set in the data_engine raw config.

 

 

If you are seeing a backup in the Hub GUI status screen for your data_engine this is the first thing you can try to help expedite the process.

 

4) Increasing the data_engine throughput to the backend Database server by making it multi-threaded

The data_engine by default is single-threaded. For most small to medium sized installations this is enough with modifications to the bulk_size only. (See above)
For larger installations with 10’s of thousands of QOS messages per minute this may not be enough. For these clients we can make the data_engine multi-threaded.
 
To make the data_engine multi-threaded you will need to add two new keys in raw config IF they currently do not exist.

thread_count_insert = XX 
queue_limit_total = 100000
 
The thread_count_insert should be set to a number equal to the total number of CPU cores on your primary hub where the data_engine sits. For most large installations this rarely needs to be set above 8 along with having your hub_bulk_size set to 1750 max.

 

 

Checking the number of cores on a system:

  • Windows

Check number of CPU cores using the Windows performance monitor->Resource monitor or run the command:

wmic cpu get NumberOfCores, NumberOfLogicalProcessors/Format:List

 

  • LINUX/UNIX

See sample commands below that identify how many processors are on a Linux machine.

 

cat /proc/cpuinfo | grep -i cpu  | grep MHz | wc –l 

 

cat /proc/cpuinfo | grep processor

 

The thread_count_insert key can make the data_engine insert data much faster.

 

5) Increase the time allowed for the data_engine to complete maintenance on all tables

The data_engine by default allows the maintenance jobs to completed in 2 hours (7200 seconds). If partitioning is not enabled or if there are a large number of RN_QOS_DATA, HN_QOS_DATA, DN_QOS_DATA, and BN_QOS_DATA tables, or one or more of these tables contain a large number of entries, the maintenance many not complete in 2 hours.  In that case, the maintenance is rolled back and changes are not committed.  It is recommended that this timeout be increased to the maximum time allowed (65536) which is a little more than 18 hours..

To increase the time to allow for maintenance to complete, modify the following key value from the data_engine probe's Raw Configure GUI:

data_management_timeout = 65536

Additional Information

Index maintenance option, when enabled and run by  the data_engine, only runs on the RN, HN, BN, DN, VN tables. That said, with large UIM environments/implementations, when you have partitioning enabled, there is really no gain in performance or advantage in defragging the table indexes, especially for large tables. For example, large tables with high index fragmentation may be defragged down to 30-70% fragmentation, but in a few days the fragmentation would return. As long as you are using partitioning, there is no need to defrag the RN, HN etc tables, but...the tables listed below must be defragged each day (during non-business hours if possible), by running a daily defrag job on them. There are a few options in how to do that but a DBA would know. If you need suggestions, a web search can show you how it can be done each day. Also, make sure your data_management_timeout is set to 65536 (not the default of 7200 which is only 2 hours).

These are the key tables with indexes that require defragmentation (DAILY)

Ask your DBA to set up a job to run this DAILY (off-hours) index defrag job.

ALTER INDEX ALL ON CM_COMPUTER_SYSTEM REBUILD;
ALTER INDEX ALL ON CM_DEVICE REBUILD;
ALTER INDEX ALL ON CM_COMPUTER_SYSTEM_ATTR REBUILD;
ALTER INDEX ALL ON CM_DEVICE_ATTRIBUTE REBUILD;
ALTER INDEX ALL ON CM_CONFIGURATION_ITEM REBUILD;
ALTER INDEX ALL ON CM_CONFIGURATION_ITEM_METRIC REBUILD;
ALTER INDEX ALL ON CM_CONFIGURATION_ITEM_DEFINITION REBUILD;
ALTER INDEX ALL ON CM_CONFIGURATION_ITEM_METRIC_DEFINITION REBUILD;
ALTER INDEX ALL ON CM_NIMBUS_ROBOT REBUILD;
ALTER INDEX ALL ON CM_DEVICE REBUILD;
ALTER INDEX ALL ON CM_COMPUTER_SYSTEM_ORIGIN REBUILD;
ALTER INDEX ALL ON CM_CONFIGURATION_ITEM_ATTRIBUTE REBUILD;
ALTER INDEX ALL ON CM_RELATIONSHIP_CI_CI REBUILD;
ALTER INDEX ALL ON CM_RELATIONSHIP_CI_CS REBUILD;
ALTER INDEX ALL ON CM_RELATIONSHIP_CS_CI REBUILD;
ALTER INDEX ALL ON CM_DISCOVERY_NETWORK REBUILD;
ALTER INDEX ALL ON S_QOS_DATA REBUILD;
ALTER INDEX ALL ON NAS_TRANSACTION_SUMMARY REBUILD;
ALTER INDEX ALL ON NAS_ALARMS REBUILD;

Data_Engine Best Practice Thread Count

‘queue_limit_total’

This parameter setting is a mechanism that enables data_engine to "slow down" and stop reading any more messages from the hub. If you enable multiple threads you open the funnel, so data_engine will process as many messages from the hub data_engine queue as the hub can deliver. 

The data_engine will have a background thread process for those messages sorted into bulk objects and marked ready for commit. This can lead to a lot of messages that reside in data_engine memory, e.g., if the database happens to fall behind (typically during maintenance or when you restart SQL Server). The default is 100,000.

So with this number at 100,000 (typed as 100000 in the configuration file, without spaces or dots), the data_engine will back off, and suspend reading messages from the hub, if it for any reason happened to have over 100,000 messages in memory that are still not committed to database. When the number drops below 100k, it will resume reading messages from the hub again.

Attachments