Unable to start the Vertica db while performing the steps to replace a cluster node
search cancel

Unable to start the Vertica db while performing the steps to replace a cluster node

book

Article ID: 392813

calendar_today

Updated On: 04-02-2025

Products

Network Observability CA Performance Management

Issue/Introduction

To perform the in-place OS upgrade from RHEL 7 to 8 the DR/Vertica was uninstalled and then re-installed. As the backup from node 3 was lost while upgrading the OS, copied node 4 backup and restore it.

Since then node 3 is down.

/opt/vertica/bin/admintools -t list_allnodes
Node               | Host           | State | Version           | DB
-------------------+----------------+-------+-------------------+--------
v_drdata_node0001  | ###.###.###.15 | UP    | vertica-10.1.1.20 | drdata
v_drdata_node0002  | ###.###.###.16 | UP    | vertica-10.1.1.20 | drdata
v_drdata_node0003  | ###.###.###.17 | DOWN  | vertica-10.1.1.20 | drdata
v_drdata_node0004  | ###.###.###.18 | UP    | vertica-10.1.1.20 | drdata
v_drdata_node0005  | ###.###.###.19 | UP    | vertica-10.1.1.20 | drdata

 

To resolve this issue node 3 must be rebuilt without restoring any data. But while rebuilding node 3 the Vertica db was shut down mistakenly. Unable to start the Vertica db.

The following error occurs when trying to start the Vertica db:

 /opt/vertica/bin/admintools -t start_db -d drdata -F

Error: Vertica versions do not match on all database hosts. Startup cannot continue.

Database drdata did not start successfully: Version mismatch

/opt/vertica/bin/admintools -t list_allnodes
Node               | Host           | State | Version           | DB
-------------------+----------------+-------+-------------------+--------
v_drdata_node0001  | ###.###.###.15 | DOWN  | vertica-10.1.1.20 | drdata
v_drdata_node0002  | ###.###.###.16 | DOWN  | vertica-10.1.1.20 | drdata
v_drdata_node0003  | ###.###.###.17 | DOWN  | unavailable       | drdata
v_drdata_node0004  | ###.###.###.18 | DOWN  | vertica-10.1.1.20 | drdata
v_drdata_node0005  | ###.###.###.19 | DOWN  | vertica-10.1.1.20 | drdata

Environment

Performance Management 23.3.1 / or any version

Cause

As node 3 is unavailable, hence you cannot start the Vertica from the front end. You have to start from the back end.

Resolution

1. Open the SSH / Putty session for all nodes

Node               | Host           |
-------------------+----------------+
v_drdata_node0001  | ###.###.###.15 | ==> open SSH / Putty
v_drdata_node0002  | ###.###.###.16 | ==> open SSH / Putty
v_drdata_node0003  | ###.###.###.17 | ==> open SSH / Putty
v_drdata_node0004  | ###.###.###.18 | ==> open SSH / Putty
v_drdata_node0005  | ###.###.###.19 | ==> open SSH / Putty

2. From the admintools utility to go Advanced Menu and kill vertica process on all hosts (as dradmin account).

3. Run the following to confirm the vertica process is not running on each node (as dradmin account):

pidof vertica

4. Close the SSH / Putty session on the problematic node (node 3):

Node               | Host           |
-------------------+----------------+
v_drdata_node0003  | ###.###.###.17 | ==> close SSH / Putty

5. On one of the nodes (node 1) check the content of the following file (as dradmin account):

cd /catalog/drdata/

ll

cd v_drdata_node0001_catalog

head vertica.log

6. From the vertica.log file grab the Processing command line and paste into Notepad and modify as follow (these are the command lines to start the vertica from the back end) (as dradmin account):

/opt/vertica/bin/vertica -D /catalog/drdata/v_drdata_node0001_catalog -C drdata -n v_drdata_node0001 -h ###.###.###.15 -p 5433 -P 4803 -Y ipv4

/opt/vertica/bin/vertica -D /catalog/drdata/v_drdata_node0002_catalog -C drdata -n v_drdata_node0002 -h ###.###.###.16 -p 5433 -P 4803 -Y ipv4

/opt/vertica/bin/vertica -D /catalog/drdata/v_drdata_node0004_catalog -C drdata -n v_drdata_node0004 -h ###.###.###.18 -p 5433 -P 4803 -Y ipv4

/opt/vertica/bin/vertica -D /catalog/drdata/v_drdata_node0005_catalog -C drdata -n v_drdata_node0005 -h ###.###.###.19 -p 5433 -P 4803 -Y ipv4

7. For each SSH / Putty session run the appropriate syntax above (node 1, run the syntax for node 1) (as dradmin account).

8. Wait some time until the Vertica is up and running (as dradmin account):

pidof vertica

cd /catalog/drdata/v_drdata_node0001_catalog

tail -f startup.log

You should see this message: "Startup Complete", "Node is UP"

9. All good nodes (1, 2, 4, and 5) are up.

admintools -t list_allnodes

Node               | Host           | State | Version           | DB
-------------------+----------------+-------+-------------------+--------
v_drdata_node0001  | ###.###.###.15 | UP    | vertica-10.1.1.20 | drdata
v_drdata_node0002  | ###.###.###.16 | UP    | vertica-10.1.1.20 | drdata
v_drdata_node0003  | ###.###.###.17 | DOWN  | unavailable       | drdata
v_drdata_node0004  | ###.###.###.18 | UP    | vertica-10.1.1.20 | drdata
v_drdata_node0005  | ###.###.###.19 | UP    | vertica-10.1.1.20 | drdata

 

Steps to rebuild the problematic node 3

10. Open the SSH / Putty session for node 3 and check for Vertica rpm package installation (it is not installed), then install it (as root account):

cd /opt/CA/IMDataRepository_vertica10/resources

ll

vertica-10.1.1-20.x86_64.RHEL6.rpm

rpm -qa vertica

rpm -Uvh vertica-10.1.1-20.x86_64.RHEL6.rpm

11. Run the following syntax to install Vertica for node 3 (as root account):

/opt/vertica/sbin/install_vertica --hosts ###.###.###.17 --rpm /opt/CA/IMDataRepository_verticat10/resource/vertica-10.1.1-20.x86_64.RHEL.rpm --dba-user dradmin

12. Rename the existing /opt/vertica/config/admintools.conf (from the new installation done on step 11) to admintools.conf_old on node 3 (as dradmin account):

cd /opt/vertica/config

mv admintools.conf admintools.conf_old

13. Copy the /opt/vertica/config/admintools.conf file from the existing installation (node 1) to node 3 (as dradmin account):

cd /opt/vertica/config

scp admintools.conf dradmin@###.###.###.17:/opt/vertica/config

14. On node 3, rename back the drdata directory (as dradmin account):

cd /data/

mv drdata_old drdata

15. Delete all files under the /data/drdata/v_drdata_node0003_data/ directory (as dradmin account):

cd /data/drdata/v_drdata_node0003_data

pidof vertica

rm -rf *

16. Rename back the catalog directory (as dradmin account):

cd /catalog/

mv drdata_old drdata

17. Delete all files under the /catalog/drdata/v_drdata_node0003_catalog/Catalog/ directory (as dradmin account):

cd /catalog/drdata/v_drdata_node0003_catalog/Catalog/

rm -rf *

18. Run the following vsql query to check the last cluster sync (as dradmin account):

vsql  (supply the password)

select get_current_epoch() CE,get_last_good_epoch() LGE,get_ahm_epoch() AHM,(get_current_epoch() - get_last_good_epoch()) CeLGDiff,(get_last_good_epoch() - get_ahm_epoch()) LgeAHmDiff ,
   get_ahm_time(),clock_timestamp();

The result is: Current AHM Time: 2025-03-27 15:15:02.319656+03

The last entry committed was on March 27th

19. Check the vertica db size on node 1 (as dradmin account):

cd /data/drdata/

du -sh v_drdata_node0001_data

26G   (It is not a big db)

20. Run the following vsql command line to start rebuilding the node 3 (as dradmin account):

select make_ahm_now('true');

21. Check the startup.log file on node 3:

cd /catalog/drdata/v_drdata_node0003_catalog

tail -f startup.log

Node Status: v_drdata_node0003: (RECOVERING)
Node Status: v_drdata_node0003: (UP)

22. Run the following vsql command line to set the Ancient History Mark (AHM) to the greatest allowed value:

select make_ahm_now();

23. Node 3 is up and running.