Dashboard has query failures in Views when run on schedule
search cancel

Dashboard has query failures in Views when run on schedule

book

Article ID: 215212

calendar_today

Updated On:

Products

CA Performance Management - Usage and Administration DX NetOps

Issue/Introduction

A Dashboard scheduled to send via email arrives with query failures for the Views.

The same Dashboard runs fine in the Portal.

The same Dashboard returns similar query ID failures when the Run Now option.

The same happens for PDF or CSV exports.

The Dashboard has just two Views on it, and the failure is random which one is impacted.

Views on the Dashboard in the PDF show a message like this. The Query ID will be unique for each instance of the error.

Looking in the (default path) /opt/CA/PerformanceCenter/PC/logs/PCService.log file for the Query ID value we see an error for the report. The error states:

          Reason: {
Error occurred while running a RIB query on Data Aggregator RIB Source. Query ID: RIBQuery_469cfb1d_91ae_49bc_8713_c3f7578b3cc2
  Possible reason: Could not get JDBC Connection; nested exception is java.sql.SQLNonTransientConnectionException: [Vertica][VJDBC](100176) Failed to connect to host <DR_Node_Name-Or-IP> on port 5433. Reason: Failed to establish a connection to the primary server or any backup address.
[Vertica][VJDBC](100176) Failed to connect to host <DR_Node_Name-Or-IP> on port 5433. Reason: Failed to establish a connection to the primary server or any backup address.
[Vertica][VJDBC](100176) Failed to connect to host <DR_Node_Name-Or-IP> on port 5433. Reason: Failed to establish a connection to the primary server or any backup address.
Failed to establish a connection to the primary server or any backup address.
Connection refused (Connection refused)

Environment

All supported DX NetOps Performance Management releases.

Cause

The Data Repository Vertica database had one of it's nodes in a multi-node cluster down. As a result the query would fail when it needed to use that node to obtain data.

Resolution

Restart the database node that is down and the errors will be resolved.

To restart a single node that has been done for more than a few days in a multi-node cluster take the following steps.

  1. Log in to one of the operational DB nodes as the OS dradmin or equivalent DB admin user.
  2. Go to the /opt/vertica/bin directory.
  3. Run:
    • ./adminTools
  4. In the Main Menu choose option 2 for Connect to Database.
    1. Enter the DB password we'd use to stop/start the DB when prompted.
    2. In the vsql prompt run the following:
      • select make_ahm_now(true);
    3. Exit the prompt with the command:
      • \q
  5. Exit adminTools from the Main Menu.
  6. In the /opt/vertica/bin directory as the dradmin or equivalent user run this for the node that's down.
    • ./admintools -t restart_node -F -s IP-of-node -d drdata
      1. Replace IP-of-node with the down nodes IP address.
      2. Replace drdata (default DB name) with the real DB name.