The copy cluster between our production and backup vertica instance fails after upgrading to the 10.1.1.20 version.
I can run the copy cluster, and it always suggests that one of the nodes (the error is random to which one) is missing a critical file, for example.
]# cat copy_cluster.log
stop the db
Database drdata stopped successfully
sync the db
Error: Missing critical file: [XX.XX.XX.XXX]:/opt/catalog/drdata/v_drdata_node0005_catalog/Snapshots/Copy_drdata.txt
Copycluster FAILED.
Starting copy of database drdata.
Participating nodes: v_drdata_node0001, v_drdata_node0002, v_drdata_node0003, v_drdata_node0004, v_drdata_node0005.
Snapshotting database.
Snapshot complete.
However, in this case, it says node5 is missing the file; I can go to that node and see that the file indeed does exist.
# cd /opt/catalog/drdata/v_drdata_node0005_catalog/Snapshots/
# ll
total 649780
-rw------- 1 <dr admin> verticadba 642861002 Mar 20 09:00 Copy_drdata.ctlg
-rw------- 1 <dr admin> verticadba 11996782 Mar 20 09:00 Copy_drdata.files
-rw------- 1 <dr admin> verticadba 10500768 Mar 20 09:00 Copy_drdata.manifest
-rw------- 1 <dr admin> verticadba 5284 Mar 20 09:00 Copy_drdata.txt
-rw------- 1 <dr admin> verticadba 0 Mar 20 09:00 Copy_drdata.udfs
In the example above:
drdata is the name of the database.
<dr admin> is the name of the DR admin OS user.
/opt/catalog is the path to the catalog directory, this will be different depending on where you placed the directories
The error is always the same other than the node in the error may change.
DX NetOps Performance Management 22.2
Defect in the Vertica vbr.py script
The fix was to change this line in the /opt/vertica/bin/vbr.py script.
Change this line from:
session_host, db_paths[next(iter(self._participating_nodes))], snap_name)
To this:
session_host, db_paths[init_node], snap_name)