Quick Install Guide: 2 node pg_auto_failover cluster with VMware Postgres
search cancel

Quick Install Guide: 2 node pg_auto_failover cluster with VMware Postgres

book

Article ID: 295249

calendar_today

Updated On:

Products

VMware Tanzu Greenplum

Issue/Introduction

pg_auto_failover is now shipped with VMware Postgres beginning with Postgres 12.6.1, 11.11, and 10.16. It will be included with Postgres 13 as well once that version is released.

This article is meant to show you how to deploy a highly-available Postgres cluster using the pg_auto_failover extension.

To start off, it is assumed that the VMware Postgres binary (vmware-postgres-<version>.el7.x86_64.rpm) has been downloaded and is available on all the Postgres hosts.

autofailover-m - Monitor node
autofailover-1 - Primary Postgres instance
autofailover-2 - Secondary Postgres Instance


What is a monitor?

The pg_auto_failover monitor is a service that keeps track of one or several formations containing groups of nodes. This means you can have a single monitor that manages high availability (HA) for multiple Postgres clusters. In some sense, it is similar to how the dedicated repository's role in pgBackrest. In terms of monitoring state and performing actions based on node availability, it can be compared to Greenplum master's FTS server process. In production environments, the monitor should be deployed on is own host.


Resources



Resolution

Install VMware Postgres on all 3 hosts as root

yum install -y vmware-postgres-12.6-1.el7.x86_64.rpm

This will create the Postgres user. You can add the Postgres user to sudoers:
echo "postgres  ALL=(ALL) NOPASSWD:ALL" | sudo tee /etc/sudoers.d/postgres

Note: Do not run initdb and startup Postgres after this step. This will be handled by pg_autoctl command as part of pg_auto_failover.

All the remaining steps will be completed as Postgres user.

On the monitor node only

1. Verify that pg_auto_failover/postgres was installed properly:
-bash-4.2$ pg_autoctl version
pg_autoctl version 1.4
pg_autoctl extension version 1.4
compiled with PostgreSQL 12.6 (VMware Postgres 12.6.1) on x86_64-redhat-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-44), 64-bit
compatible with Postgres 10, 11, 12, and 13

-bash-4.2$ pg_ctl -V
pg_ctl (PostgreSQL) 12.6 (VMware Postgres 12.6.1)

2. Create the monitor node:
-bash-4.2$ pg_autoctl create monitor --auth trust --ssl-self-signed --pgdata monitor
17:59:55 14359 INFO  Using default --ssl-mode "require"
17:59:55 14359 INFO  Using --ssl-self-signed: pg_autoctl will create self-signed certificates, allowing for encrypted network traffic
17:59:55 14359 WARN  Self-signed certificates provide protection against eavesdropping; this setup does NOT protect against Man-In-The-Middle attacks nor Impersonation attacks.
17:59:55 14359 WARN  See https://www.postgresql.org/docs/current/libpq-ssl.html for details
17:59:55 14359 INFO  Using --hostname "autofailover-m", which resolves to IP address "192.168.99.128"
17:59:55 14359 INFO  Initialising a PostgreSQL cluster at "monitor"
17:59:55 14359 INFO  /bin/pg_ctl initdb -s -D monitor --option '--auth=trust'
17:59:56 14359 INFO   /bin/openssl req -new -x509 -days 365 -nodes -text -out /var/lib/pgsql/monitor/server.crt -keyout /var/lib/pgsql/monitor/server.key -subj "/CN=autofailover-m"
17:59:56 14359 INFO  Started pg_autoctl postgres service with pid 14375
17:59:56 14375 INFO   /usr/bin/pg_autoctl do service postgres --pgdata monitor -v
17:59:56 14359 INFO  Started pg_autoctl monitor-init service with pid 14376
17:59:56 14381 INFO   /bin/postgres -D /var/lib/pgsql/monitor -p 5432 -h *
17:59:56 14375 INFO  Postgres is now serving PGDATA "/var/lib/pgsql/monitor" on port 5432 with pid 14381
17:59:56 14376 WARN  NOTICE:  installing required extension "btree_gist"
17:59:56 14376 INFO  Granting connection privileges on 192.168.99.0/24
17:59:56 14376 INFO  Your pg_auto_failover monitor instance is now ready on port 5432.
17:59:56 14376 INFO  Monitor has been successfully initialized.
17:59:56 14359 WARN  pg_autoctl service monitor-init exited with exit status 0
17:59:56 14375 INFO  Postgres controller service received signal SIGTERM, terminating
17:59:56 14375 INFO  Stopping pg_autoctl postgres service
17:59:56 14375 INFO  /bin/pg_ctl --pgdata /var/lib/pgsql/monitor --wait stop --mode fast
17:59:57 14359 INFO  Stop pg_autoctl

At this point, the database has been initialized and is used specifically by the monitor to store state information about the Postgres cluster, however it is not actually running yet.

3. We will add the service so systemd can manage the startup:
pg_autoctl -q show systemd --pgdata ~postgres/monitor > pgautofailover.service
sudo mv pgautofailover.service /etc/systemd/system
sudo systemctl daemon-reload
sudo systemctl enable pgautofailover
sudo systemctl start pgautofailover

The last command will start the monitor. Verify the current state:
-bash-4.2$ ps -ef | grep postgres
postgres 14554     1  0 18:04 ?        00:00:00 /usr/bin/pg_autoctl run
postgres 14557 14554  0 18:04 ?        00:00:00 pg_autoctl: start/stop postgres
postgres 14558 14554  0 18:04 ?        00:00:00 pg_autoctl: monitor listener
postgres 14566 14557  0 18:04 ?        00:00:00 /bin/postgres -D /var/lib/pgsql/monitor -p 5432 -h *
postgres 14567 14566  0 18:04 ?        00:00:00 postgres: pg_auto_failover monitor: logger   
postgres 14569 14566  0 18:04 ?        00:00:00 postgres: pg_auto_failover monitor: checkpointer   
postgres 14570 14566  0 18:04 ?        00:00:00 postgres: pg_auto_failover monitor: background writer  
postgres 14571 14566  0 18:04 ?        00:00:00 postgres: pg_auto_failover monitor: walwriter   
postgres 14572 14566  0 18:04 ?        00:00:00 postgres: pg_auto_failover monitor: autovacuum launcher  
postgres 14573 14566  0 18:04 ?        00:00:00 postgres: pg_auto_failover monitor: stats collector  
postgres 14574 14566  0 18:04 ?        00:00:00 postgres: pg_auto_failover monitor: pg_auto_failover monitor  
postgres 14575 14566  0 18:04 ?        00:00:00 postgres: pg_auto_failover monitor: logical replication launcher  
postgres 14576 14566  0 18:04 ?        00:00:00 postgres: pg_auto_failover monitor: pg_auto_failover monitor worker  
postgres 14577 14566  0 18:04 ?        00:00:00 postgres: pg_auto_failover monitor: pg_auto_failover monitor worker  
postgres 14578 14566  0 18:04 ?        00:00:00 postgres: pg_auto_failover monitor: autoctl_node pg_auto_failover [local] idle

-bash-4.2$ psql
psql (12.6 (VMware Postgres 12.6.1))
Type "help" for help.

postgres=# \l
                                     List of databases
       Name       |  Owner   | Encoding |   Collate   |    Ctype    |   Access privileges   
------------------+----------+----------+-------------+-------------+-----------------------
 pg_auto_failover | autoctl  | UTF8     | en_US.UTF-8 | en_US.UTF-8 | 
 postgres         | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | 
 template0        | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/postgres          +
                  |          |          |             |             | postgres=CTc/postgres
 template1        | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/postgres          +
                  |          |          |             |             | postgres=CTc/postgres
(4 rows)

Now we are ready to install our Postgres primary and secondary nodes.


Primary host (autofailover-1)

1. Create the first Postgres instance. This will automatically be set by the monitor as the primary node with read-write capability (PGDATA = /var/lib/pgpsql/ha):
-bash-4.2$ pg_autoctl create postgres --pgdata ha --auth trust --ssl-self-signed --username ha-admin --dbname appdb --hostname  autofailover-1 --pgctl /bin/pg_ctl --monitor 'postgres://autoctl_node@autofailover-m/pg_auto_failover?sslmode=require'
18:21:09 14081 INFO  Using default --ssl-mode "require"
18:21:09 14081 INFO  Using --ssl-self-signed: pg_autoctl will create self-signed certificates, allowing for encrypted network traffic
18:21:09 14081 WARN  Self-signed certificates provide protection against eavesdropping; this setup does NOT protect against Man-In-The-Middle attacks nor Impersonation attacks.
18:21:09 14081 WARN  See https://www.postgresql.org/docs/current/libpq-ssl.html for details
18:21:09 14081 INFO  Started pg_autoctl postgres service with pid 14083
18:21:09 14083 INFO   /usr/bin/pg_autoctl do service postgres --pgdata ha -v
18:21:09 14081 INFO  Started pg_autoctl node-init service with pid 14084
18:21:09 14084 INFO  Registered node 1 (autofailover-1:5432) with name "node_1" in formation "default", group 0, state "single"
18:21:09 14084 INFO  Writing keeper state file at "/var/lib/pgsql/.local/share/pg_autoctl/var/lib/pgsql/ha/pg_autoctl.state"
18:21:09 14084 INFO  Writing keeper init state file at "/var/lib/pgsql/.local/share/pg_autoctl/var/lib/pgsql/ha/pg_autoctl.init"
18:21:09 14084 INFO  Successfully registered as "single" to the monitor.
18:21:09 14084 INFO  FSM transition from "init" to "single": Start as a single node
18:21:09 14084 INFO  Initialising postgres as a primary
18:21:09 14084 INFO  Initialising a PostgreSQL cluster at "ha"
18:21:09 14084 INFO  /bin/pg_ctl initdb -s -D ha --option '--auth=trust'
18:21:09 14083 ERROR Failed to open file "/var/lib/pgsql/ha/postmaster.pid": No such file or directory
18:21:10 14084 INFO   /bin/openssl req -new -x509 -days 365 -nodes -text -out /var/lib/pgsql/ha/server.crt -keyout /var/lib/pgsql/ha/server.key -subj "/CN=autofailover-1"
18:21:10 14107 INFO   /bin/postgres -D /var/lib/pgsql/ha -p 5432 -h *
18:21:10 14083 INFO  Postgres is now serving PGDATA "/var/lib/pgsql/ha" on port 5432 with pid 14107
18:21:10 14084 INFO  CREATE DATABASE appdb;
18:21:10 14084 INFO  CREATE EXTENSION pg_stat_statements;
18:21:10 14084 INFO  Contents of "/var/lib/pgsql/ha/postgresql-auto-failover.conf" have changed, overwriting
18:21:10 14084 INFO  Transition complete: current state is now "single"
18:21:10 14084 INFO  keeper has been successfully initialized.
18:21:11 14081 WARN  pg_autoctl service node-init exited with exit status 0
18:21:11 14083 INFO  Postgres controller service received signal SIGTERM, terminating
18:21:11 14083 INFO  Stopping pg_autoctl postgres service
18:21:11 14083 INFO  /bin/pg_ctl --pgdata /var/lib/pgsql/ha --wait stop --mode fast
18:21:11 14081 INFO  Stop pg_autoctl

2. As was the case with the monitor node, we need to add the service to systemd to get it to start:
-bash-4.2$ pg_autoctl -q show systemd --pgdata ~postgres/ha > pgautofailover.service
sudo mv pgautofailover.service /etc/systemd/system
sudo systemctl daemon-reload
sudo systemctl enable pgautofailover
sudo systemctl start pgautofailover

3. Verify the state:
-bash-4.2$ pg_autoctl show state
  Name |  Node |           Host:Port |       LSN | Reachable |       Current State |      Assigned State
-------+-------+---------------------+-----------+-----------+---------------------+--------------------
node_1 |     1 | autofailover-1:5432 | 0/16DB650 |       yes |              single |              single


Secondary Host (autofailover-2)

1. Create the secondary:
-bash-4.2$ pg_autoctl create postgres --pgdata ha --auth trust --ssl-self-signed --username ha-admin --dbname appdb --hostname  autofailover-2 --pgctl /bin/pg_ctl --monitor 'postgres://autoctl_node@autofailover-m/pg_auto_failover?sslmode=require'
18:49:12 16573 INFO  Using default --ssl-mode "require"
18:49:12 16573 INFO  Using --ssl-self-signed: pg_autoctl will create self-signed certificates, allowing for encrypted network traffic
18:49:12 16573 WARN  Self-signed certificates provide protection against eavesdropping; this setup does NOT protect against Man-In-The-Middle attacks nor Impersonation attacks.
18:49:12 16573 WARN  See https://www.postgresql.org/docs/current/libpq-ssl.html for details
18:49:12 16573 INFO  Started pg_autoctl postgres service with pid 16575
18:49:12 16575 INFO   /usr/bin/pg_autoctl do service postgres --pgdata ha -v
18:49:12 16573 INFO  Started pg_autoctl node-init service with pid 16576
18:49:12 16576 INFO  Registered node 2 (autofailover-2:5432) with name "node_2" in formation "default", group 0, state "wait_standby"
18:49:12 16576 INFO  Writing keeper state file at "/var/lib/pgsql/.local/share/pg_autoctl/var/lib/pgsql/ha/pg_autoctl.state"
18:49:12 16576 INFO  Writing keeper init state file at "/var/lib/pgsql/.local/share/pg_autoctl/var/lib/pgsql/ha/pg_autoctl.init"
18:49:12 16576 INFO  Successfully registered as "wait_standby" to the monitor.
18:49:12 16576 INFO  FSM transition from "init" to "wait_standby": Start following a primary
18:49:12 16576 INFO  Transition complete: current state is now "wait_standby"
18:49:12 16576 INFO  New state for node 1 "node_1" (autofailover-1:5432): single ➜ wait_primary
18:49:12 16576 INFO  New state for node 1 "node_1" (autofailover-1:5432): wait_primary ➜ wait_primary
18:49:12 16576 INFO  Still waiting for the monitor to drive us to state "catchingup"
18:49:12 16576 WARN  Please make sure that the primary node is currently running `pg_autoctl run` and contacting the monitor.
18:49:12 16576 INFO  FSM transition from "wait_standby" to "catchingup": The primary is now ready to accept a standby
18:49:12 16576 INFO  Initialising PostgreSQL as a hot standby
18:49:12 16576 INFO   /bin/pg_basebackup -w -d application_name=pgautofailover_standby_2 host=autofailover-1 port=5432 user=pgautofailover_replicator sslmode=require --pgdata /var/lib/pgsql/backup/node_2 -U pgautofailover_replicator --verbose --progress --max-rate 100M --wal-method=stream --slot pgautofailover_standby_2
18:49:12 16576 INFO  pg_basebackup: initiating base backup, waiting for checkpoint to complete
18:49:12 16576 INFO  pg_basebackup: checkpoint completed
18:49:12 16576 INFO  pg_basebackup: write-ahead log start point: 0/2000028 on timeline 1
18:49:12 16576 INFO  pg_basebackup: starting background WAL receiver
18:49:12 16576 INFO      0/33531 kB (0%), 0/1 tablespace (...pgsql/backup/node_2/backup_label)
18:49:12 16576 INFO  33540/33540 kB (100%), 0/1 tablespace (.../backup/node_2/global/pg_control)
18:49:12 16576 INFO  33540/33540 kB (100%), 1/1 tablespace                                         
18:49:12 16576 INFO  pg_basebackup: write-ahead log end point: 0/2000138
18:49:12 16576 INFO  pg_basebackup: waiting for background process to finish streaming ...
18:49:12 16576 INFO  pg_basebackup: syncing data to disk ...
18:49:12 16576 INFO  pg_basebackup: base backup completed
18:49:12 16576 INFO  Creating the standby signal file at "/var/lib/pgsql/ha/standby.signal", and replication setup at "/var/lib/pgsql/ha/postgresql-auto-failover-standby.conf"
18:49:12 16576 INFO   /bin/openssl req -new -x509 -days 365 -nodes -text -out /var/lib/pgsql/ha/server.crt -keyout /var/lib/pgsql/ha/server.key -subj "/CN=autofailover-2"
18:49:12 16584 INFO   /bin/postgres -D /var/lib/pgsql/ha -p 5432 -h *
18:49:13 16576 INFO  PostgreSQL started on port 5432
18:49:13 16576 INFO  Fetched current list of 1 other nodes from the monitor to update HBA rules, including 1 changes.
18:49:13 16576 INFO  Ensuring HBA rules for node 1 (autofailover-1:5432)
18:49:13 16576 INFO  Transition complete: current state is now "catchingup"
18:49:13 16576 INFO  keeper has been successfully initialized.
18:49:13 16573 WARN  pg_autoctl service node-init exited with exit status 0
18:49:13 16575 INFO  Postgres is now serving PGDATA "/var/lib/pgsql/ha" on port 5432 with pid 16584
18:49:13 16575 INFO  Postgres controller service received signal SIGTERM, terminating
18:49:13 16575 INFO  Stopping pg_autoctl postgres service
18:49:13 16575 INFO  /bin/pg_ctl --pgdata /var/lib/pgsql/ha --wait stop --mode fast
18:49:13 16573 INFO  Waiting for subprocesses to terminate.
18:49:13 16573 INFO  Stop pg_autoctl

You can see pg_auto_failover knows to create this as a secondary and leverages pg_basebackup.

2. Add to systemd:
pg_autoctl -q show systemd --pgdata ~postgres/ha > pgautofailover.service
sudo mv pgautofailover.service /etc/systemd/system
sudo systemctl daemon-reload
sudo systemctl enable pgautofailover
sudo systemctl start pgautofailover

3. Check the state:
-bash-4.2$ pg_autoctl show state
  Name |  Node |           Host:Port |       LSN | Reachable |       Current State |      Assigned State
-------+-------+---------------------+-----------+-----------+---------------------+--------------------
node_1 |     1 | autofailover-1:5432 | 0/3015058 |       yes |             primary |             primary
node_2 |     2 | autofailover-2:5432 | 0/3015058 |       yes |           secondary |           secondary

The 2-node cluster setup is complete. If you want to scale this cluster to add additional nodes, just follow the same procedure for adding a secondary. There is no limit to the number of secondary nodes that can be added into a single group.


Test to make sure secondary is read-only

pgautofailover-2:
-bash-4.2$ psql
psql (12.6 (VMware Postgres 12.6.1))
Type "help" for help.

postgres=# create table kevin (id int);
ERROR:  cannot execute CREATE TABLE in a read-only transaction

pgautofailover-1:
-bash-4.2$ psql
psql (12.6 (VMware Postgres 12.6.1))
Type "help" for help.

postgres=# create table kevin (id int);
CREATE TABLE


Graceful failover test

-bash-4.2$ pg_autoctl perform failover
20:00:02 27286 INFO  Listening monitor notifications about state changes in formation "default" and group 0
20:00:02 27286 INFO  Following table displays times when notifications are received
    Time |   Name |  Node |           Host:Port |       Current State |      Assigned State
---------+--------+-------+---------------------+---------------------+--------------------
20:00:02 | node_1 |     1 | autofailover-1:5432 |             primary |            draining
20:00:02 | node_2 |     2 | autofailover-2:5432 |           secondary |   prepare_promotion
20:00:02 | node_2 |     2 | autofailover-2:5432 |   prepare_promotion |   prepare_promotion
20:00:02 | node_2 |     2 | autofailover-2:5432 |   prepare_promotion |    stop_replication
20:00:02 | node_1 |     1 | autofailover-1:5432 |             primary |      demote_timeout
20:00:02 | node_1 |     1 | autofailover-1:5432 |            draining |      demote_timeout
20:00:02 | node_1 |     1 | autofailover-1:5432 |      demote_timeout |      demote_timeout
20:00:03 | node_2 |     2 | autofailover-2:5432 |    stop_replication |    stop_replication
20:00:03 | node_2 |     2 | autofailover-2:5432 |    stop_replication |        wait_primary
20:00:03 | node_1 |     1 | autofailover-1:5432 |      demote_timeout |             demoted
20:00:03 | node_1 |     1 | autofailover-1:5432 |             demoted |             demoted
20:00:03 | node_2 |     2 | autofailover-2:5432 |        wait_primary |        wait_primary
20:00:04 | node_1 |     1 | autofailover-1:5432 |             demoted |          catchingup
20:00:05 | node_1 |     1 | autofailover-1:5432 |          catchingup |          catchingup
20:00:05 | node_1 |     1 | autofailover-1:5432 |          catchingup |           secondary
20:00:05 | node_2 |     2 | autofailover-2:5432 |        wait_primary |             primary
20:00:05 | node_1 |     1 | autofailover-1:5432 |           secondary |           secondary
20:00:06 | node_2 |     2 | autofailover-2:5432 |             primary |             primary
-bash-4.2$ pg_autoctl show state
  Name |  Node |           Host:Port |       LSN | Reachable |       Current State |      Assigned State
-------+-------+---------------------+-----------+-----------+---------------------+--------------------
node_1 |     1 | autofailover-1:5432 | 0/50000D8 |       yes |           secondary |           secondary
node_2 |     2 | autofailover-2:5432 | 0/50000D8 |       yes |             primary |             primary

Note: By default, VMware Postgres will set PGDATA as /var/lib/pgpsql/data. If you use a different PGDATA when setting up Postgres, pg_autoctl will give you this error message when getting state information:
-bash-4.2$ pg_autoctl show state
18:36:43 18482 ERROR Failed to open file "/var/lib/pgsql/.config/pg_autoctl/var/lib/pgsql/data/pg_autoctl.cfg": No such file or directory
18:36:43 18482 FATAL Failed to parse configuration file "/var/lib/pgsql/.config/pg_autoctl/var/lib/pgsql/data/pg_autoctl.cfg"

Set PGDATA in /var/lib/pgpsql/.bash_profile to the correct location.