data_engine 9.02 fails to insert QoS into the database

book

Article ID: 128958

calendar_today

Updated On:

Products

DX Infrastructure Management NIMSOFT PROBES

Issue/Introduction

1) I am seeing the alarm "Failed to insert QoS data into the database, check that the database is running"
2) I am seeing the alarm "[Microsoft SQL Server Native Client 11.0] Invalid character value for cast specification"
3) I am seeing the alarm "[Microsoft SQL Server Native Client 11.0] Unspecified error"
4) I am seeing the error "temporarily out of resources (7)" in my data_engine.log file
5) The data_engine queue is growing and Data Engine is not processing any messages

Cause

The problem appears when data engine gets into a continuous loop triggered by a single insert failure. This causes a chain reaction of database reconnects and hence stops data engine from processing other messages until the database insert succeeds.

Below is a simplified explanation of how data engine handles commit failures to help understand the problem.

Data engine stores QoS data internally in buckets where each bucket holds data to be inserted in a specific RN_QOS_DATA_XXX table. Each bucket has two buffers(input buffer and output buffer). Data read from data_engine queue in the hub is written to input buffer and when sufficient data has been gathered, data engine moves the data to output buffer. Data in the output buffer is ready for commit. There are workers inside data engine that perform the commit operation. 
When insert to data base fails for any table, 
1. Data engine first restores the buffers (data is moved from output buffer back to input buffer) 
2. Data engine writes the QoS data present in all the buckets to separate BLK files. 
3. Data engine disconnects from the database 
4. Data engine performs some clean up and then reconnects to the database 
5. After reconnect, data engine moves data from input buffers to output buffers 
6. The process of database insert resumes 
7. If the database insert fails again at step 6, data engine goes back to step 1 thereby causing an indefinite loop.

Environment

UIM 9.0.2
Data Engine 9.02
MS SQL Server

Resolution

Download the latest data_engine hotfix from the CA UIM Hotfix Index.

As of this writing that is data_engine 9.02HF3, but this could change in the future.