Incremental recovery sometimes requires pg_rewind to be run. This is especially true in cases where the WAL timeline diverges between the primary and the mirror, which can happen when a primary failover occurs due to segment PANIC.
pg_rewind by design will delete any files (except for logs and a few control files) that appear in the acting primary but is not there in the down mirror.
By default, gpbackup places it's backup data directory (backups) in the primary's segment data directory.
For example:
[gpadmin@sdw2-lab1 ~]$ ls -l /data/primary/gp_6.18.2_202112121834_kevin_seg2 total 104 drwxrwxr-x 5 gpadmin gpadmin 54 Jan 4 17:49 backups drwx------ 9 gpadmin gpadmin 97 Dec 23 22:39 base -rw------- 1 gpadmin gpadmin 32768 Jan 4 21:32 fts_probe_file.bak drwx------ 2 gpadmin gpadmin 4096 Dec 29 20:06 global -rw------- 1 gpadmin gpadmin 10 Dec 12 18:35 internal.auto.conf drwx------ 2 gpadmin gpadmin 18 Dec 12 18:35 pg_clog drwx------ 2 gpadmin gpadmin 18 Dec 12 18:35 pg_distributedlog drwx------ 2 gpadmin gpadmin 6 Dec 12 18:35 pg_dynshmem ...
This backups directory does not get replicated to the mirror directory:
[gpadmin@sdw1-lab1 ~]$ ls -l /data/mirror/gp_6.18.2_202112121834_######_seg2 total 76 -rw------- 1 gpadmin gpadmin 206 Dec 12 18:35 backup_label.old drwx------ 8 gpadmin gpadmin 80 Dec 23 22:39 base drwx------ 2 gpadmin gpadmin 4096 Dec 29 20:03 global -rw-rw-r-- 1 gpadmin gpadmin 10 Dec 12 18:35 internal.auto.conf drwx------ 2 gpadmin gpadmin 18 Dec 12 18:35 pg_clog drwx------ 2 gpadmin gpadmin 18 Dec 12 18:35 pg_distributedlog drwx------ 2 gpadmin gpadmin 6 Dec 12 18:35 pg_dynshmem -rw------- 1 gpadmin gpadmin 4708 Dec 12 18:35 pg_hba.conf -rw------- 1 gpadmin gpadmin 1636 Dec 12 18:35 pg_ident.conf drwx------ 2 gpadmin gpadmin 4096 Jan 4 00:00 pg_log
In the case of a primary segment PANIC, or some other event that cause a WAL timeline divergence, this will cause the primary to failover to the mirror.
An incremental recovery afterwards would require pg_rewind to sync up the primary and mirror. You can tell if pg_rewind is invoked by observing the following lines during gprecoverseg:
20220104:21:42:11:024448 gprecoverseg:mdw-lab1:gpadmin-[INFO]:-Running pg_rewind on failed segments sdw2-lab1 (dbid 4): no rewind required
pg_rewind required
20220104:21:54:12:027197 gprecoverseg:mdw-lab1:gpadmin-[INFO]:-Running pg_rewind on failed segments sdw2-lab1 (dbid 4): 745723/1736705 kB (42%) copied
You can see when pg_rewind is run, it deletes the backups directory in the original primary:
[gpadmin@sdw2-lab1 ~]$ ls -l /data/primary/gp_6.18.2_202112121834_kevin_seg2 total 112 -rw------- 1 gpadmin gpadmin 175 Jan 4 21:54 backup_label.old drwx------ 8 gpadmin gpadmin 80 Jan 4 21:54 base -rw------- 1 gpadmin gpadmin 32768 Jan 4 21:54 fts_probe_file.bak drwx------ 2 gpadmin gpadmin 4096 Jan 4 21:54 global -rw------- 1 gpadmin gpadmin 10 Dec 12 18:35 internal.auto.conf
This can lead to two potential problems
1. Any previous gpbackup that used the default backup directory is now unusable for restore due to missing backup files from the one segment.
2. Incremental recovery can fail if it is unable to remove the backup directory during recovery due to a permissions issue or some other issue and it instructs you to run full recovery:
gprecoverseg log
20211216:11:29:41:702088 gprecoverseg:mdw-lab1:gpadmin-[WARNING]:- 20211216:11:29:41:702088 gprecoverseg:mdw-lab1:gpadmin-[WARNING]:-Incremental recovery failed for dbid 4. You must use gprecoverseg -F to recover the segment.
pg_rewind log
servers diverged at WAL position 848/9F60F628 on timeline 1 rewinding from last common checkpoint at 848/89EC5590 on timeline 1 reading source file list reading target file list reading WAL in target need to copy 9770 MB (total source directory size is 2058875 MB) could not remove directory "/data/primary/gp_6.18.2_202112121834_kevin_seg2/backups/20220104/20220104174918": Directory not empty Failure, exiting
Note: This only affects incremental recovery. Full recovery will not have this problem.
This is fixed in GPDB 6.19.2 and above.
[gpadmin@mdw-lab1 ~]$ gpbackup --dbname gpadmin --backup-dir /home/gpadmin/backups [gpadmin@sdw1-lab1 backups]$ ls -l /home/gpadmin/backups/ total 0 drwxrwxr-x 3 gpadmin gpadmin 21 Jan 4 22:20 gp_6.18.2_202112121834_kevin_seg0 drwxrwxr-x 3 gpadmin gpadmin 21 Jan 4 22:20 gp_6.18.2_202112121834_kevin_seg1 drwxrwxr-x 3 gpadmin gpadmin 21 Jan 4 22:20 gp_6.18.2_202112121834_kevin_seg2