Vertica node in cluster never completes Recovery in Performance Management
search cancel

Vertica node in cluster never completes Recovery in Performance Management

book

Article ID: 8078

calendar_today

Updated On:

Products

CA Infrastructure Management CA Performance Management - Usage and Administration DX NetOps

Issue/Introduction

One Vertica node in a 3 node cluster is in a constant "recovering" state when trying to start the database.

The other two nodes are up but one never completes the recovery cycle and the database never completes it's restart.

Environment

All supported Performance Management releases

Cause

When a Vertica node has been down for an extended period of time when you try to bring it up it can get stuck in a "Recovering" state.

Seen often when a node, or all nodes, in a Vertica DB cluster are rebooted without proper database shut down prior to reboot.

Resolution

First on the problem node stop the database then on of the working nodes log into vsql as the dradmin user and set the Ancient History Mark using the procedure below. 

  1. Stop Vertica on the problem node using the following command:
    • /opt/vertica/bin/adminTools -t stop_node --hosts xx.xx.xx.xx
  2. Run the following vsql command to set the Ancient History Mark (AHM) in the database. First enter the vsql prompt, then run the command, then quit to exist the prompt.
    • /opt/vertica/bin/vsql -U <dbAdminUser> -w <dbAdminPassword>
    • select make_ahm_now(true);
    • vsql>\q
  3. Restart the database on the problem node with the following command:
    • /opt/vertica/bin/admintools -t restart_node -d drdata --hosts x.x.x.x

This process sets the Ancient History Mark (AHM) in the database causing the node to rebuild from the other 2. This is better and faster than the alternative, allowing it to try recovery from transaction history.