One node in a Vertica Data Repository cluster fails to start
search cancel

One node in a Vertica Data Repository cluster fails to start

book

Article ID: 250999

calendar_today

Updated On:

Products

CA Performance Management - Usage and Administration DX NetOps

Issue/Introduction

One of the DR Nodes is down.

In a 3 node DX NetOps Performance Management Data Repository Vertica database cluster one node is down.

Node0003 in a 3 node DR cluster fails to start.

Trying to start it using the adminTools UI results in failure. It still reports as down.

In the /<CatalogHomeDir>/drdata/v_drdata_node0003_catalog/vertica.log file we see these messages.

2022-09-28 15:30:27.460 Main:0x7f4b87565600-fff0000000000cc1 [Command] <INFO> Setting up UDType pointers
2022-09-28 15:30:27.460 Main:0x7f4b87565600-fff0000000000cc1 [Catalog] <WARNING> Couldn't load libraries
2022-09-28 15:30:27.460 Main:0x7f4b87565600-fff0000000000cc1 <PANIC> @v_drdata_node0003: VX001/2973: Data consistency problems found; startup aborted
        HINT:  Check that all file systems are properly mounted.  Also, the --force option can be used to delete corrupted data and recover from the cluster
        LOCATION:  mainEntryPoint, /data/qb_workspaces/jenkins2/ReleaseBuilds/Hammermill/REL-10_1_1-x_hammermill/build/vertica/Basics/vertica.cpp:1805
2022-09-28 15:30:27.498 Main:0x7f4b87565600-fff0000000000cc1 [Main] <PANIC> Wrote backtrace to ErrorReport.txt
2022-09-28 15:30:27.498 Main:0x7f4b87565600-fff0000000000cc1 [Main] <ALL> Core dumped to /catalog/drdata/v_drdata_node0003_catalog

Environment

All supported DX NetOps Performance Management releases

Cause

The server hosting node0003 was rebooted unexpectedly without first shutting the database down on node0003.

This was observed using the 'uptime' command where two nodes show 100+ day uptime values while the problem node shows it running for less than 24 hours.

Resolution

To resolve this take the following steps.

  1. Log in to the terminal of one of the nodes in the cluster as the dradmin user.
  2. Run the following command:
    1. /opt/vertica/bin/admintools -t restart_node -F -s <IP_of_Down_Node> -d <DBName>
    2. Replace:
      1. <IP_of_Down_Node> with the IP address of the down node.
      2. <DBName> with the name of the database
    3. Example using default recommended DB name 'drdata' and IP address 127.0.0.1.
      1. /opt/vertica/bin/admintools -t restart_node -F -s 127.0.0.1 -d drdata
  3. Enter Yes to wait for it to complete it's working restarting the down node.

If this fails to restart the node open a new Support case via support.broadcom.com.