How to setup pgBackrest on a cluster using pg_auto_failover
search cancel

How to setup pgBackrest on a cluster using pg_auto_failover

book

Article ID: 296402

calendar_today

Updated On: 06-17-2024

Products

VMware Tanzu Greenplum

Issue/Introduction

pgBackrest is the recommended tool to use for backup / restore in a VMware Tanzu Postgres database. Steps to install for single-instance Postgres are straightforward and can be found in the pgBackrest documentation: 
 


pg_auto_failover is now the recommended method of setting up a high availability (HA) VMware Postgres cluster as of versions 12.6.1, 11.11, and 10.16. This article will build off of the installation from the pg_auto_failover - Quick Install Guide and show how to use pgBackrest to create backups.

I will be using the monitor node as a dedicated repository to allow support for multiple formations and groups to be backed up in the same repository. This setup has it's pros and cons, so please evaluate which setup is best for your use case.
 

  • autofailover-m - 192.168.99.128 - Monitor node and pgBackrest dedicated repository
  • autofailover-1 - 192.168.99.129 - Primary Postgres instance
  • autofailover-2 - 192.168.99.130 - Secondary Postgres instance
  • autofailover-3 - 192.168.99.131 - Secondary Postgres instance


This article applies to pg_auto_failover version 1.4 and pgBackrest version 2.31.

Environment

Product Version: 13.2

Resolution

1. Exchange ssh keys to allow password-less login between all three hosts. We will assume in this article that the .ssh directory does not exist in the Postgres user's home directory (as a Postgres user):
mkdir .ssh; chmod 700 .ssh
ssh-keygen -f /var/lib/pgsql/.ssh/id_rsa -t rsa -b 4096 -N ""
cd .ssh
touch authorized_keys; chmod 640 authorized_keys

Now copy the contents of ~/.ssh/id_rsa.pub into ~/.ssh/authorized_keys on the other two hosts that you want to ssh into. Each host will have it's own id_rsa.pub. The step above needs to be done on all three hosts.

Note: It is very important that these specific permissions the files and .ssh folder are set, otherwise it won't work.

2. Do the following on all four hosts as root:
yum install -y perl perl-Time-HiRes perl-Digest-SHA perl-JSON-PP
mkdir -p /etc/pgbackrest; mkdir -p /etc/pgbackrest/conf.d
touch /etc/pgbackrest/pgbackrest.conf; chmod 640 /etc/pgbackrest/pgbackrest.conf
mkdir -p /var/log/pgbackrest; chmod 770 /var/log/pgbackrest
chown postgres:postgres /etc/pgbackrest/pgbackrest.conf; chown postgres:postgres /var/log/pgbackrest 

Note: For simplicity, we are using Postgres user for the dedicated repository, but it is recommended to create and use a separate pgbackrest user in the dedicated repository host

3. In the repository host (autofailover-m), run the following command:
mkdir -p /var/lib/pgbackrest; chmod 750 /var/lib/pgbackrest; chown postgres:postgres /var/lib/pgbackrest

4. Setup repository host's configs (/etc/pgbackrest/pgbackrest.conf):
[main]
pg1-path=/var/lib/pgsql/ha
pg1-port=5432
pg1-host=192.168.99.129
pg1-socket-path=/tmp
pg2-path=/var/lib/pgsql/ha
pg2-port=5432
pg2-host=192.168.99.130
pg2-socket-path=/tmp
pg3-path=/var/lib/pgsql/ha
pg3-port=5432
pg3-host=192.168.99.131
pg3-socket-path=/tmp

[global]
repo1-path=/var/lib/pgbackrest
repo1-retention-full=2
start-fast=y

This file tells pgBackrest where the Postgres instances are located, how it is archived, and how it is backed up. It is known as a stanza. In our example, "main" is going to be our stanza name when we create it later. Here are the options we are using:
pg1-path - postgres data directory
pg1-path-socket-path - Location of .s.PGSQL.5432 socket file (show unix_socket_directories)
repo1-path - location of the archive and backup files
repo1-retention-fill - number of full backups to retain
start-fast - immediate checkpoint is requested and the backup will start more quickly.

Also note that "pg" parameters are indexed by Postgres instance number. In our example, we have two Postgres instances so we have two sets of parameters. You can also specify multiple repo[n]_path locations, but that is beyond the scope of this article. 

5. Configure pgBackrest on all the Postgres hosts:
vi /etc/pgbackrest/pgbackrest.conf
[main]
pg1-path=/var/lib/pgsql/ha
pg1-socket-path=/tmp
pg1-port=5432

[global]
log-level-file=detail
repo1-host=192.168.99.128
repo1-host-user=postgres

6. On all Postgres hosts, add these GUCs to postgresql.conf. This will turn on archive logging.
archive_mode = on
archive_command = 'pgbackrest --stanza=main archive-push %p'

7. Add the following entry to pg_hba.conf on all Postgres nodes:
host    all             postgres             192.168.99.128/32            trust

8. Restart on all the hosts:
pg_ctl restart

9. Create the stanza on the repository host:
-bash-4.2$ pgbackrest --stanza=main --log-level-console=info stanza-create
2021-03-25 18:18:47.924 P00   INFO: stanza-create command begin 2.31: --exec-id=13932-4149954c --log-level-console=info --pg1-host=192.168.99.129 --pg2-host=192.168.99.130 --pg3-host=192.168.99.131 --pg1-path=/var/lib/pgsql/ha --pg2-path=/var/lib/pgsql/ha --pg3-path=/var/lib/pgsql/ha --pg1-port=5432 --pg2-port=5432 --pg3-port=5432 --pg1-socket-path=/tmp --pg2-socket-path=/tmp --pg3-socket-path=/tmp --repo1-path=/var/lib/pgbackrest --stanza=main
2021-03-25 18:18:51.315 P00   INFO: stanza-create command end: completed successfully (3392ms)

10. Check the config on the Postgres hosts:
-bash-4.2$ pgbackrest --stanza=main --log-level-console=info check
2021-03-25 18:23:31.795 P00   INFO: check command begin 2.31: --exec-id=13198-383db85b --log-level-console=info --log-level-file=detail --pg1-path=/var/lib/pgsql/ha --pg1-port=5432 --pg1-socket-path=/tmp --repo1-host=192.168.99.128 --repo1-host-user=postgres --stanza=main
2021-03-25 18:23:33.347 P00   INFO: WAL segment 00000005000000000000000F successfully archived to '/var/lib/pgbackrest/archive/main/12-1/0000000500000000/00000005000000000000000F-cc5333580ac5afefcb186d3034f88d470d9230d2.gz'
2021-03-25 18:23:33.452 P00   INFO: check command end: completed successfully (1658ms)

-bash-4.2$ pgbackrest --stanza=main --log-level-console=info check
2021-03-25 18:23:37.118 P00   INFO: check command begin 2.31: --exec-id=5223-465034e1 --log-level-console=info --log-level-file=detail --pg1-path=/var/lib/pgsql/ha --pg1-port=5432 --pg1-socket-path=/tmp --repo1-host=192.168.99.128 --repo1-host-user=postgres --stanza=main
The authenticity of host '192.168.99.128 (192.168.99.128)' can't be established.
ECDSA key fingerprint is SHA256:Su0SZi4qDErdgrx9bkofW8h83EhpJgXlC2/Wqh1iXus.
ECDSA key fingerprint is MD5:8c:e6:53:78:ce:4b:e9:59:43:e2:01:8e:be:3b:66:e0.
Are you sure you want to continue connecting (yes/no)? yes
2021-03-25 18:23:39.898 P00   INFO: switch wal not performed because this is a standby
2021-03-25 18:23:39.999 P00   INFO: check command end: completed successfully (2882ms)'

11. Finally, let's take a backup:
-bash-4.2$ pgbackrest --log-level-console=info --stanza=main backup
2021-03-25 18:24:54.525 P00   INFO: backup command begin 2.31: --exec-id=15168-f3e715c7 --log-level-console=info --pg1-host=192.168.99.129 --pg2-host=192.168.99.130 --pg3-host=192.168.99.131 --pg1-path=/var/lib/pgsql/ha --pg2-path=/var/lib/pgsql/ha --pg3-path=/var/lib/pgsql/ha --pg1-port=5432 --pg2-port=5432 --pg3-port=5432 --pg1-socket-path=/tmp --pg2-socket-path=/tmp --pg3-socket-path=/tmp --repo1-path=/var/lib/pgbackrest --repo1-retention-full=2 --stanza=main --start-fast
WARN: no prior backup exists, incr backup has been changed to full
2021-03-25 18:24:57.595 P00   INFO: execute non-exclusive pg_start_backup(): backup begins after the requested immediate checkpoint completes
2021-03-25 18:24:58.010 P00   INFO: backup start archive = 000000050000000000000011, lsn = 0/11000028
2021-03-25 18:24:59.587 P01   INFO: backup file 192.168.99.129:/var/lib/pgsql/ha/base/16385/1255 (632KB, 1%) checksum 81dbc197ab6066b111a4e5bb10e8b27e7d07d546
2021-03-25 18:24:59.607 P01   INFO: backup file 192.168.99.129:/var/lib/pgsql/ha/base/14185/1255 (632KB, 3%) checksum 708a78f8ca445084659bc398cb9b64fdac63836a
2021-03-25 18:24:59.625 P01   INFO: backup file 192.168.99.129:/var/lib/pgsql/ha/base/14184/1255 (632KB, 5%) checksum 708a78f8ca445084659bc398cb9b64fdac63836a
2021-03-25 18:24:59.646 P01   INFO: backup file 192.168.99.129:/var/lib/pgsql/ha/base/1/1255 (632KB, 7%) checksum 708a78f8ca445084659bc398cb9b64fdac63836a
...
2021-03-25 18:25:03.540 P01   INFO: backup file 192.168.99.129:/var/lib/pgsql/ha/base/1/14032 (0B, 100%)
2021-03-25 18:25:03.542 P01   INFO: backup file 192.168.99.129:/var/lib/pgsql/ha/base/1/14027 (0B, 100%)
2021-03-25 18:25:03.845 P01   INFO: backup file 192.168.99.129:/var/lib/pgsql/ha/base/1/14022 (0B, 100%)
2021-03-25 18:25:03.848 P00   INFO: full backup size = 32.1MB
2021-03-25 18:25:03.848 P00   INFO: execute non-exclusive pg_stop_backup() and wait for all WAL segments to archive
2021-03-25 18:25:04.054 P00   INFO: backup stop archive = 000000050000000000000011, lsn = 0/11000138
2021-03-25 18:25:04.070 P00   INFO: check archive for segment(s) 000000050000000000000011:000000050000000000000011
2021-03-25 18:25:04.592 P00   INFO: new backup label = 20210325-182457F
2021-03-25 18:25:04.625 P00   INFO: backup command end: completed successfully (10102ms)
2021-03-25 18:25:04.625 P00   INFO: expire command begin 2.31: --exec-id=15168-f3e715c7 --log-level-console=info --pg1-host=192.168.99.129 --pg2-host=192.168.99.130 --pg3-host=192.168.99.131 --repo1-path=/var/lib/pgbackrest --repo1-retention-full=2 --stanza=main
2021-03-25 18:25:04.830 P00   INFO: expire command end: completed successfully (205ms)

pgBackrest will automatically detect which Postgres host is the primary. Let's trigger a failover and take the backup again. There will be no additional config needed as it will just work:
-bash-4.2$ pg_autoctl show state
  Name |  Node |           Host:Port |       LSN | Reachable |       Current State |      Assigned State
-------+-------+---------------------+-----------+-----------+---------------------+--------------------
node_1 |     1 | autofailover-1:5432 | 0/D000110 |       yes |             primary |             primary
node_2 |     2 | autofailover-2:5432 | 0/D000110 |       yes |           secondary |           secondary
node_3 |     3 | autofailover-3:5432 | 0/D000110 |       yes |           secondary |           secondary

-bash-4.2$ pg_autoctl perform failover
18:26:35 13620 INFO  Targetting group 0 in formation "default"
18:26:35 13620 INFO  Listening monitor notifications about state changes in formation "default" and group 0
18:26:35 13620 INFO  Following table displays times when notifications are received

-bash-4.2$ pg_autoctl show state
  Name |  Node |           Host:Port |        LSN | Reachable |       Current State |      Assigned State
-------+-------+---------------------+------------+-----------+---------------------+--------------------
node_1 |     1 | autofailover-1:5432 | 0/15002108 |       yes |           secondary |           secondary
node_2 |     2 | autofailover-2:5432 | 0/15002108 |       yes |             primary |             primary
node_3 |     3 | autofailover-3:5432 | 0/15002108 |       yes |           secondary |           secondary

Take the backup again:
-bash-4.2$ pgbackrest --log-level-console=info --stanza=main backup
2021-03-25 18:27:56.939 P00   INFO: backup command begin 2.31: --exec-id=15752-457179fa --log-level-console=info --pg1-host=192.168.99.129 --pg2-host=192.168.99.130 --pg3-host=192.168.99.131 --pg1-path=/var/lib/pgsql/ha --pg2-path=/var/lib/pgsql/ha --pg3-path=/var/lib/pgsql/ha --pg1-port=5432 --pg2-port=5432 --pg3-port=5432 --pg1-socket-path=/tmp --pg2-socket-path=/tmp --pg3-socket-path=/tmp --repo1-path=/var/lib/pgbackrest --repo1-retention-full=2 --stanza=main --start-fast
2021-03-25 18:28:00.114 P00   INFO: last backup label = 20210325-182457F, version = 2.31
2021-03-25 18:28:00.114 P00   INFO: execute non-exclusive pg_start_backup(): backup begins after the requested immediate checkpoint completes
2021-03-25 18:28:00.642 P00   INFO: backup start archive = 000000060000000000000016, lsn = 0/16000028
WARN: a timeline switch has occurred since the 20210325-182457F backup, enabling delta checksum
      HINT: this is normal after restoring from backup or promoting a standby.
2021-03-25 18:28:01.646 P01   INFO: backup file 192.168.99.130:/var/lib/pgsql/ha/base/14185/1249 (432KB, 21%) checksum 6358bfd5721fd513b43c6359ae4c4f1af59213e0
2021-03-25 18:28:02.155 P01   INFO: backup file 192.168.99.130:/var/lib/pgsql/ha/global/pg_control (8KB, 87%) checksum e776a96eab80d5b6a40efb61223e8c24b37cccce
2021-03-25 18:28:02.650 P01   INFO: backup file 192.168.99.130:/var/lib/pgsql/ha/log/postgresql-2021-03-25_181040.log (5.4KB, 99%) checksum 764b9b6b03ecb1d60fc3251143db237694463b3a
2021-03-25 18:28:02.653 P01   INFO: backup file 192.168.99.130:/var/lib/pgsql/ha/log/postgresql-2021-03-25_182641.log (5.2KB, 99%) checksum e87cc87fd43031f988f7e1abe64cd54d3ce1c7f0
2021-03-25 18:28:02.661 P01   INFO: backup file 192.168.99.130:/var/lib/pgsql/ha/log/postgresql-2021-03-25_181044.log (1.5KB, 99%) checksum 1f8bcbc0045a1589ae6b2aa4644041111b731a36
2021-03-25 18:28:02.669 P01   INFO: backup file 192.168.99.130:/var/lib/pgsql/ha/startup.log (626B, 99%) checksum 4c564220631f9df9d7f7a8b45f4cf8a88f2f87d5
2021-03-25 18:28:02.678 P01   INFO: backup file 192.168.99.130:/var/lib/pgsql/ha/postgresql.auto.conf (215B, 99%) checksum 9b21f647f4c2c7b3d1053b16903ce3390841dfcd
2021-03-25 18:28:02.680 P01   INFO: backup file 192.168.99.130:/var/lib/pgsql/ha/current_logfiles (44B, 99%) checksum 2790ad3d23fb67c8011289d9f8159364a554daeb
2021-03-25 18:28:02.907 P00   INFO: incr backup size = 32.1MB
2021-03-25 18:28:02.907 P00   INFO: execute non-exclusive pg_stop_backup() and wait for all WAL segments to archive
2021-03-25 18:28:03.158 P00   INFO: backup stop archive = 000000060000000000000016, lsn = 0/16000138
2021-03-25 18:28:03.163 P00   INFO: check archive for segment(s) 000000060000000000000016:000000060000000000000016
2021-03-25 18:28:03.496 P00   INFO: new backup label = 20210325-182457F_20210325-182800I
2021-03-25 18:28:03.527 P00   INFO: backup command end: completed successfully (6589ms)
2021-03-25 18:28:03.527 P00   INFO: expire command begin 2.31: --exec-id=15752-457179fa --log-level-console=info --pg1-host=192.168.99.129 --pg2-host=192.168.99.130 --pg3-host=192.168.99.131 --repo1-path=/var/lib/pgbackrest --repo1-retention-full=2 --stanza=main
2021-03-25 18:28:03.733 P00   INFO: expire command end: completed successfully (206ms)