Data Repository (DR) Vertica node stopped - Not able to start
search cancel

Data Repository (DR) Vertica node stopped - Not able to start

book

Article ID: 135660

calendar_today

Updated On:

Products

CA Performance Management - Usage and Administration DX NetOps

Issue/Introduction

In a 3 or multi-node Vertica cluster, one of the nodes is unable to start, even though the others are up and running.

Environment

All supported versions of DX NetOps Performance Management

Cause

In the Vertica log, located at:

<CATALOG-PATH>/drdata/v_drdata_node<NUMBER>_catalog/vertica.log,

The following is seen at the end of the log:

        HINT:  Check that all file systems are properly mounted.  Also, the --force option can be used to delete corrupted data and recover from the cluster

Resolution

As the database administrator user (default is: dradmin), do the following on the DR:

  1.  cd /opt/vertica/bin

  2. ./admintools -t restart_node --host=<NODE_IP_ADDRESS> -d <DB_NAME> --force

    So, for example:

    ./admintools -t restart_node --host=xxx.xxx.xxx.xxx -d drdata --force

    You will see output similar to:

    Info: no password specified, using none

    *** Restarting nodes for database drdata ***

            Restarting host [xxx.xxx.xxx.xxx] with catalog [v_drdata_node0003_catalog]

            Issuing multi-node restart

            Starting nodes:

                    v_drdata_node0003 (xxx.xxx.xxx.xxx)

            Starting Vertica on all nodes. Please wait, databases with a large catalog may take a while to initialize.

            Node Status: v_drdata_node0003: (DOWN)

            Node Status: v_drdata_node0003: (DOWN)

            Node Status: v_drdata_node0003: (DOWN)

            Node Status: v_drdata_node0003: (DOWN)

            Node Status: v_drdata_node0003: (DOWN)

            Node Status: v_drdata_node0003: (DOWN)

            Node Status: v_drdata_node0003: (RECOVERING)

            Node Status: v_drdata_node0003: (RECOVERING)

            Node Status: v_drdata_node0003: (UP)


    Note: If asked to continue to wait, respond yes until the process either completes or errors out.

  3. You can then check it is all running:

    # ./admintools -t list_allnodes

     Node              | Host              | State | Version          | DB

    -------------------+-------------------+-------+------------------+--------

     v_drdata_node0001 | xxx.xxx.xxx.xxx    | UP    | vertica-10.1.1.0 | drdata

     v_drdata_node0002 | xxx.xxx.xxx.xxx    | UP    | vertica-10.1.1.0 | drdata

     v_drdata_node0003 | xxx.xxx.xxx.xxx    | UP    | vertica-10.1.1.0 | drdata

     

If it is still unable to start, there may be disk partition issues or other issues mentioned in the vertica.log that must be resolved before restarting.

Additional Information