Vertica copy cluster now fails after 22.2.5 upgrade that upgrades Vertica to 10_1_1_20
search cancel

Vertica copy cluster now fails after 22.2.5 upgrade that upgrades Vertica to 10_1_1_20

book

Article ID: 263379

calendar_today

Updated On:

Products

CA Performance Management - Usage and Administration DX NetOps

Issue/Introduction

The copy cluster between our production and backup vertica instance now fails after upgrading to the 10.1.1.20 version. This has been working perfectly since back on version 9, the script has been untouched. Now we get these vauge error messages and the copy cluster fails. 

I can run the copy cluster and it always suggests that on one of the 5 nodes (error is random to which one) is missing a critical file, example.

]# cat copy_cluster.log 
stop the db
Database drdata stopped successfully
sync the db
Error: Missing critical file: [XX.XX.XX.XXX]:/opt/catalog/drdata/v_drdata_node0005_catalog/Snapshots/Copy_drdata.txt
Copycluster FAILED.
Starting copy of database drdata.
Participating nodes: v_drdata_node0001, v_drdata_node0002, v_drdata_node0003, v_drdata_node0004, v_drdata_node0005.
Snapshotting database.
Snapshot complete.

 

However in this case it says node5 is missing the file, I can go to that node and see that the file indeed does exist.

# cd /opt/catalog/drdata/v_drdata_node0005_catalog/Snapshots/
# ll
total 649780
-rw------- 1 dradmin verticadba 642861002 Mar 20 09:00 Copy_drdata.ctlg
-rw------- 1 dradmin verticadba  11996782 Mar 20 09:00 Copy_drdata.files
-rw------- 1 dradmin verticadba  10500768 Mar 20 09:00 Copy_drdata.manifest
-rw------- 1 dradmin verticadba      5284 Mar 20 09:00 Copy_drdata.txt
-rw------- 1 dradmin verticadba         0 Mar 20 09:00 Copy_drdata.udfs

I've have run this multiple times and the error is always the same other than it will say node5 then node3, node4, ect.

 

 

Environment

Dx NetOps Performance Management 22.2

Cause

Defect in the Vertica vbr.py script 

Resolution

The fix was to change this line in the /opt/vertica/bin/vbr.py script. 

Change this line from:

session_host, db_paths[next(iter(self._participating_nodes))], snap_name)

To this: 
session_host, db_paths[init_node], snap_name)