Data Aggregator fails to start due to database heartbeat failure
search cancel

Data Aggregator fails to start due to database heartbeat failure

book

Article ID: 31690

calendar_today

Updated On:

Products

CA Infrastructure Management CA Performance Management - Usage and Administration DX NetOps

Issue/Introduction

Data Aggregator keeps shutting down with the following error in the karaf.log file.

ERROR | t Monitor Thread | 2015-09-03 10:19:48,944 | shutdown | ase.heartbeat.DBStateManagerImpl 385 | ommon.core.services.impl | | DB heartbeat to host ><hostname of Db execeeded max non-success time of 300000
 
WARN | t Monitor Thread | 2015-09-03 10:19:48,944 | shutdown | ase.heartbeat.DBStateManagerImpl 726 | ommon.core.services.impl | | DB state for host <hostname of Db> changing from OK to DOWN

Environment

All supported Performance Management releases

Cause

If there is no valid database connection then the Data Aggregator shuts down.

Resolution

The two solutions for each common cause are as follows.

  1. To check if the Data Repository Vertica database is up and running, and restart if it needed, review this Knowledge Base Article regarding proper restart of the Performance Management environment.
  2. If the database is found to be running, the slow time to complete the connection could be causing the issue. The Data Aggregator waits for 5 minutes before shutting itself down without a valid connection to the database. We can try to temporarily extend the time the Data Aggregator waits to complete this connection before giving up. Making the following change will raise the wait time to 10 minutes. To make this change:
    1. In the (default path) /opt/IMDataAggregator/apache-karaf-<version>/etc directory create a new file named:
      • com.ca.im.core.services.database.heartbeat.DBStateManager.cfg
    2. Within the new file add the following property entries to specify a 10 minute timeout failure threshold instead of the default 300000, which is 5 minutes.
      • maxNonSuccessTimeBeforeDRNodeConsideredDown=600000

Additional Information

If option 2 is required to resolve the issue it does indicate a problem that is worth investigation despite it functioning. If this is desired please open a new support case.

If the database is running, and option 2 doesn't resolve the problem, please open a new case with the support team for investigation.