What to do when the data_engine probe is down / not working
search cancel

What to do when the data_engine probe is down / not working

book

Article ID: 366830

calendar_today

Updated On:

Products

DX Unified Infrastructure Management (Nimsoft / UIM) CA Unified Infrastructure Management On-Premise (Nimsoft / UIM) CA Unified Infrastructure Management SaaS (Nimsoft / UIM)

Issue/Introduction

The purpose of this guide is to assist users who are facing a critical situation where the data_engine probe is non-functional (either down or not inserting data).

  • This document has been created using a data-driven approach and focuses on the top most common scenarios encountered during a “production down” situation. It does not cover every possible scenario or case, but rather focuses on the most common/most likely scenarios that are encountered by customers in production environments that lead to a “Severity 1” production outage.

  • According to data collected by Broadcom Support, the vast majority of data_engine problems classified as “Severity 1” or “production down” are related to a change or issue within the database server or network environment outside of DX UIM itself, and usually require the involvement of a customer’s DBA and/or network administration team.

  • Therefore, it stands to reason that when the data_engine probe stops working, the environment should be the first point of investigation; this is especially true in a case where the probe had been working previously and there have been no recent product configuration changes. In such cases, there is likely a problem, outage, or change in the environment itself as opposed to a DX UIM product issue, therefore the necessary teams to investigate these factors should be engaged early in the troubleshooting process.

  • We have categorized the most commonly-seen outage scenarios and their causes so that the most likely environmental factors leading to data_engine outages can be quickly identified, appropriate teams (DBA/network) can be engaged when necessary, and appropriate steps can be taken to resolve the situation and allow normal operation to resume without delay.

Most of the issues described here are outside the domain of the DX UIM product itself, for example, database server issues or firewall configurations;  where appropriate, links to relevant knowledge articles or technical documentation are included for guidance.

KB Articles and technical documents linked will be targeted toward MS SQL Server in most cases as this represents the largest portion of our user base, but similar articles for MySQL/Oracle can usually be found by searching the Broadcom Knowledge Base. 

Environment

  • DX UIM - All Supported Versions
  • All Database Platforms (MSSQL, Oracle, MySQL)

Cause

Rather than focus on specific log messages or error messages, we will look at broader categories of behavior to help direct troubleshooting efforts.

Situations that are considered critical for the data_engine probe can be categorized into two broad categories:

  • data_engine is red/will not start;  
  • data_engine queue is growing/data is not being inserted

Either way, the causes of such outages or failures can generally be sorted into the following categories (listed in order of frequency):

  • Issues encountered after some updates/patches/maintenance in the environment (not including DX UIM itself)
  • Issues encountered during normal operation - no recent environmental maintenance or upgrades have been identified
  • Issues encountered during or immediately after an upgrade of DX UIM or application of a Cumulative Update (CU)

 

Resolution

Issues encountered after updates/patches/maintenance in the environment  (42.85% of issues analyzed)

Common issue descriptions include:

  • After applying Operating System patches, data_engine will not start.
  • After upgrading the database, data is getting queued on a frequent basis
  • After a vmotion event the data_engine can no longer connect to the database

Common resolutions (and the percentage of these scenarios resolved):


Issues encountered during normal operation (38.10% of issues analyzed)

Common issue descriptions include:

  • I keep getting alarms that the data_engine could not insert QoS or that the data_engine queue is growing large
  • I restarted the primary robot and data_engine won’t start now
  • I can see a lot of data queuing up in the hub GUI for data_engine’s queue and it is frequently turning yellow
  • data_engine is randomly throwing “max restarts” alarms and turning red

Common resolutions/percentages:

Issues encountered during or after upgrade to DX UIM (19.05% of issues analyzed)

Common issue descriptions:

  • My DX UIM upgrade failed during the data_engine configuration step
  • My DX UIM upgrade is failing at the end because it says data_engine is not starting
  • After upgrading DX UIM successfully I am seeing a lot of queueing of data and the data_engine PID keeps changing

Common resolutions/percentages:

Additional Information

After you have corrected any issues within the environment, you may need to take additional steps to restore functionality to the DX UIM environment.

At a minimum, you should restart the DX UIM primary hub robot and/or HA Hub (if applicable), and then restart any robot(s) that host instances of Operator Console or CABI.

  • If the credentials used to connect to the database were changed as part of the resolution, you will need to consult this article to ensure the password is updated appropriately across the installation.

  • If the database server itself has been changed (e.g. database migration or server IP change) you should consult this article for the necessary changes.

  • For the complete DX UIM data_engine technical documentation, please refer to the data_engine probe document at the following link: data_engine.

  • For additional troubleshooting information for issues you might encounter while upgrading, configuring, or using different versions of the data_engine probe check the following link: data_engine troubleshooting

  • For data_engine best practices see the following link: data_engine best practices

 

If this article has not been helpful and the issue appears to be related to the data_engine probe itself please consult this article for further possible scenarios.