To perform the in-place OS upgrade from RHEL 7 to 8 the DR/Vertica was uninstalled and then re-installed. As the backup from node 3 was lost while upgrading the OS, copied node 4 backup and restore it.
Since then node 3 is down.
/opt/vertica/bin/admintools -t list_allnodes
Node | Host | State | Version | DB
-------------------+----------------+-------+-------------------+--------
v_drdata_node0001 | ###.###.###.15 | UP | vertica-10.1.1.20 | drdata
v_drdata_node0002 | ###.###.###.16 | UP | vertica-10.1.1.20 | drdata
v_drdata_node0003 | ###.###.###.17 | DOWN | vertica-10.1.1.20 | drdata
v_drdata_node0004 | ###.###.###.18 | UP | vertica-10.1.1.20 | drdata
v_drdata_node0005 | ###.###.###.19 | UP | vertica-10.1.1.20 | drdata
To resolve this issue node 3 must be rebuilt without restoring any data. But while rebuilding node 3 the Vertica db was shut down mistakenly. Unable to start the Vertica db.
The following error occurs when trying to start the Vertica db:
/opt/vertica/bin/admintools -t start_db -d drdata -F
Error: Vertica versions do not match on all database hosts. Startup cannot continue.
Database drdata did not start successfully: Version mismatch
/opt/vertica/bin/admintools -t list_allnodes
Node | Host | State | Version | DB
-------------------+----------------+-------+-------------------+--------
v_drdata_node0001 | ###.###.###.15 | DOWN | vertica-10.1.1.20 | drdata
v_drdata_node0002 | ###.###.###.16 | DOWN | vertica-10.1.1.20 | drdata
v_drdata_node0003 | ###.###.###.17 | DOWN | unavailable | drdata
v_drdata_node0004 | ###.###.###.18 | DOWN | vertica-10.1.1.20 | drdata
v_drdata_node0005 | ###.###.###.19 | DOWN | vertica-10.1.1.20 | drdata
Performance Management 23.3.1 / or any version
As node 3 is unavailable, hence you cannot start the Vertica from the front end. You have to start from the back end.
1. Open the SSH / Putty session for all nodes
Node | Host |
-------------------+----------------+
v_drdata_node0001 | ###.###.###.15 | ==> open SSH / Putty
v_drdata_node0002 | ###.###.###.16 | ==> open SSH / Putty
v_drdata_node0003 | ###.###.###.17 | ==> open SSH / Putty
v_drdata_node0004 | ###.###.###.18 | ==> open SSH / Putty
v_drdata_node0005 | ###.###.###.19 | ==> open SSH / Putty
2. From the admintools utility to go Advanced Menu and kill vertica process on all hosts (as dradmin account).
3. Run the following to confirm the vertica process is not running on each node (as dradmin account):
pidof vertica
4. Close the SSH / Putty session on the problematic node (node 3):
Node | Host |
-------------------+----------------+
v_drdata_node0003 | ###.###.###.17 | ==> close SSH / Putty
5. On one of the nodes (node 1) check the content of the following file (as dradmin account):
cd /catalog/drdata/
ll
cd v_drdata_node0001_catalog
head vertica.log
6. From the vertica.log file grab the Processing command line and paste into Notepad and modify as follow (these are the command lines to start the vertica from the back end) (as dradmin account):
/opt/vertica/bin/vertica -D /catalog/drdata/v_drdata_node0001_catalog -C drdata -n v_drdata_node0001 -h ###.###.###.15 -p 5433 -P 4803 -Y ipv4
/opt/vertica/bin/vertica -D /catalog/drdata/v_drdata_node0002_catalog -C drdata -n v_drdata_node0002 -h ###.###.###.16 -p 5433 -P 4803 -Y ipv4
/opt/vertica/bin/vertica -D /catalog/drdata/v_drdata_node0004_catalog -C drdata -n v_drdata_node0004 -h ###.###.###.18 -p 5433 -P 4803 -Y ipv4
/opt/vertica/bin/vertica -D /catalog/drdata/v_drdata_node0005_catalog -C drdata -n v_drdata_node0005 -h ###.###.###.19 -p 5433 -P 4803 -Y ipv4
7. For each SSH / Putty session run the appropriate syntax above (node 1, run the syntax for node 1) (as dradmin account).
8. Wait some time until the Vertica is up and running (as dradmin account):
pidof vertica
cd /catalog/drdata/v_drdata_node0001_catalog
tail -f startup.log
You should see this message: "Startup Complete", "Node is UP"
9. All good nodes (1, 2, 4, and 5) are up.
admintools -t list_allnodes
Node | Host | State | Version | DB
-------------------+----------------+-------+-------------------+--------
v_drdata_node0001 | ###.###.###.15 | UP | vertica-10.1.1.20 | drdata
v_drdata_node0002 | ###.###.###.16 | UP | vertica-10.1.1.20 | drdata
v_drdata_node0003 | ###.###.###.17 | DOWN | unavailable | drdata
v_drdata_node0004 | ###.###.###.18 | UP | vertica-10.1.1.20 | drdata
v_drdata_node0005 | ###.###.###.19 | UP | vertica-10.1.1.20 | drdata
Steps to rebuild the problematic node 3
10. Open the SSH / Putty session for node 3 and check for Vertica rpm package installation (it is not installed), then install it (as root account):
cd /opt/CA/IMDataRepository_vertica10/resources
ll
vertica-10.1.1-20.x86_64.RHEL6.rpm
rpm -qa vertica
rpm -Uvh vertica-10.1.1-20.x86_64.RHEL6.rpm
11. Run the following syntax to install Vertica for node 3 (as root account):
/opt/vertica/sbin/install_vertica --hosts ###.###.###.17 --rpm /opt/CA/IMDataRepository_verticat10/resource/vertica-10.1.1-20.x86_64.RHEL.rpm --dba-user dradmin
12. Rename the existing /opt/vertica/config/admintools.conf (from the new installation done on step 11) to admintools.conf_old on node 3 (as dradmin account):
cd /opt/vertica/config
mv admintools.conf admintools.conf_old
13. Copy the /opt/vertica/config/admintools.conf file from the existing installation (node 1) to node 3 (as dradmin account):
cd /opt/vertica/config
scp admintools.conf dradmin@###.###.###.17:/opt/vertica/config
14. On node 3, rename back the drdata directory (as dradmin account):
cd /data/
mv drdata_old drdata
15. Delete all files under the /data/drdata/v_drdata_node0003_data/ directory (as dradmin account):
cd /data/drdata/v_drdata_node0003_data
pidof vertica
rm -rf *
16. Rename back the catalog directory (as dradmin account):
cd /catalog/
mv drdata_old drdata
17. Delete all files under the /catalog/drdata/v_drdata_node0003_catalog/Catalog/ directory (as dradmin account):
cd /catalog/drdata/v_drdata_node0003_catalog/Catalog/
rm -rf *
18. Run the following vsql query to check the last cluster sync (as dradmin account):
vsql (supply the password)
select get_current_epoch() CE,get_last_good_epoch() LGE,get_ahm_epoch() AHM,(get_current_epoch() - get_last_good_epoch()) CeLGDiff,(get_last_good_epoch() - get_ahm_epoch()) LgeAHmDiff ,
get_ahm_time(),clock_timestamp();
The result is: Current AHM Time: 2025-03-27 15:15:02.319656+03
The last entry committed was on March 27th
19. Check the vertica db size on node 1 (as dradmin account):
cd /data/drdata/
du -sh v_drdata_node0001_data
26G (It is not a big db)
20. Run the following vsql command line to start rebuilding the node 3 (as dradmin account):
select make_ahm_now('true');
21. Check the startup.log file on node 3:
cd /catalog/drdata/v_drdata_node0003_catalog
tail -f startup.log
Node Status: v_drdata_node0003: (RECOVERING)
Node Status: v_drdata_node0003: (UP)
22. Run the following vsql command line to set the Ancient History Mark (AHM) to the greatest allowed value:
select make_ahm_now();
23. Node 3 is up and running.