Data engine 20.40 is not inserting QOS data and it is frequently getting disconnected
search cancel

Data engine 20.40 is not inserting QOS data and it is frequently getting disconnected

book

Article ID: 251972

calendar_today

Updated On:

Products

DX Unified Infrastructure Management (Nimsoft / UIM)

Issue/Introduction

Data engine queue is not getting released in one of our PROD shred environments due to which multiple customers are impacted.

QOS messages are backed up/queued. There are no messages being Sent.

Environment

  • Release: UIM 20.4
  • data_engine v20.40
  • Microsoft SQL Server v2016 SP2
  • hub 9.35
  • Robot 9.35

Cause

  • Database Server resources issue - caused by sharing the database server with other applications and in this case multiple 'Microsoft Mashup Routine' processes were blocking UIM functions.

Resolution

The first symptom noticed was that the data_engine queue was severely backed up and not processing any data so the queued QOS messages kept increasing.

The second symptom noticed was that the data_engine.log was being written to by the probe VERY slowly.

The third issue was that the data_engine queue did not contain one of its default queue Subjects, so we added it-> QOS_DEFINITION

-- query to list blocked sessions (run this more than once to check)

Results showed multiples of the Microsoft Mashup Routine processes blocking sessions.

select p.query_plan, ex.session_id, ex.blocking_session_id, db_name(ex.database_id)as dbname, s.host_name, s.program_name, s.login_name, 

ex.status, ex.command, ex.last_wait_type, ex.cpu_time, ex.reads, ex.writes, ex.percent_complete 

from sys.dm_exec_requests ex INNER JOIN sys.dm_exec_sessions s

ON ex.session_id = s.session_id cross apply sys.dm_exec_query_plan(ex.plan_handle) p


Example shown below but we also saw some sessions blocked as well when we ran the query several times.

Steps followed:

  1. Clear the data_engine queue
  2. Deactivate the data_engine probe
  3. Stop the Nimsoft Service
  4. Reboot the Microsoft SQL Server machine
  5. Restart the Nimsoft Robot Watcher Service
  6. Ensure all probes are up and have a port and a PID
  7. Open IM->data_engine GUI and make sure the data_engine can connect to the database (test the connection via the GUI)
  8. View the data_engine.log in IM and see if the data_engine is writing to the log faster/updating more quickly. (It was...)

View the data_engine queue in the Hub GUI Status window to make sure it starts processing and continues processing data.

Once the data_engine thread_count_insert was set to 24, and hub_bulk_size was set to 1750, the data_engine performance was very good and the hub GUI Status->Queued column remained at or near zero on every click of the refresh button.