Install VMware Postgres on all 3 hosts as root
yum install -y vmware-postgres-12.6-1.el7.x86_64.rpm
This will create the Postgres user. You can add the Postgres user to
sudoers:
echo "postgres ALL=(ALL) NOPASSWD:ALL" | sudo tee /etc/sudoers.d/postgres
Note: Do
not run
initdb
and startup Postgres after this step. This will be handled by
pg_autoctl command as part of
pg_auto_failover.
All the remaining steps will be completed as Postgres user.
On the monitor node only1. Verify that
pg_auto_failover/postgres was installed properly:
-bash-4.2$ pg_autoctl version
pg_autoctl version 1.4
pg_autoctl extension version 1.4
compiled with PostgreSQL 12.6 (VMware Postgres 12.6.1) on x86_64-redhat-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-44), 64-bit
compatible with Postgres 10, 11, 12, and 13
-bash-4.2$ pg_ctl -V
pg_ctl (PostgreSQL) 12.6 (VMware Postgres 12.6.1)
2. Create the monitor node:
-bash-4.2$ pg_autoctl create monitor --auth trust --ssl-self-signed --pgdata monitor
17:59:55 14359 INFO Using default --ssl-mode "require"
17:59:55 14359 INFO Using --ssl-self-signed: pg_autoctl will create self-signed certificates, allowing for encrypted network traffic
17:59:55 14359 WARN Self-signed certificates provide protection against eavesdropping; this setup does NOT protect against Man-In-The-Middle attacks nor Impersonation attacks.
17:59:55 14359 WARN See https://www.postgresql.org/docs/current/libpq-ssl.html for details
17:59:55 14359 INFO Using --hostname "autofailover-m", which resolves to IP address "192.168.99.128"
17:59:55 14359 INFO Initialising a PostgreSQL cluster at "monitor"
17:59:55 14359 INFO /bin/pg_ctl initdb -s -D monitor --option '--auth=trust'
17:59:56 14359 INFO /bin/openssl req -new -x509 -days 365 -nodes -text -out /var/lib/pgsql/monitor/server.crt -keyout /var/lib/pgsql/monitor/server.key -subj "/CN=autofailover-m"
17:59:56 14359 INFO Started pg_autoctl postgres service with pid 14375
17:59:56 14375 INFO /usr/bin/pg_autoctl do service postgres --pgdata monitor -v
17:59:56 14359 INFO Started pg_autoctl monitor-init service with pid 14376
17:59:56 14381 INFO /bin/postgres -D /var/lib/pgsql/monitor -p 5432 -h *
17:59:56 14375 INFO Postgres is now serving PGDATA "/var/lib/pgsql/monitor" on port 5432 with pid 14381
17:59:56 14376 WARN NOTICE: installing required extension "btree_gist"
17:59:56 14376 INFO Granting connection privileges on 192.168.99.0/24
17:59:56 14376 INFO Your pg_auto_failover monitor instance is now ready on port 5432.
17:59:56 14376 INFO Monitor has been successfully initialized.
17:59:56 14359 WARN pg_autoctl service monitor-init exited with exit status 0
17:59:56 14375 INFO Postgres controller service received signal SIGTERM, terminating
17:59:56 14375 INFO Stopping pg_autoctl postgres service
17:59:56 14375 INFO /bin/pg_ctl --pgdata /var/lib/pgsql/monitor --wait stop --mode fast
17:59:57 14359 INFO Stop pg_autoctl
At this point, the database has been initialized and is used specifically by the monitor to store state information about the Postgres cluster, however it is not actually running yet.
3. We will add the service so
systemd can manage the startup:
pg_autoctl -q show systemd --pgdata ~postgres/monitor > pgautofailover.service
sudo mv pgautofailover.service /etc/systemd/system
sudo systemctl daemon-reload
sudo systemctl enable pgautofailover
sudo systemctl start pgautofailover
The last command will start the monitor. Verify the current state:
-bash-4.2$ ps -ef | grep postgres
postgres 14554 1 0 18:04 ? 00:00:00 /usr/bin/pg_autoctl run
postgres 14557 14554 0 18:04 ? 00:00:00 pg_autoctl: start/stop postgres
postgres 14558 14554 0 18:04 ? 00:00:00 pg_autoctl: monitor listener
postgres 14566 14557 0 18:04 ? 00:00:00 /bin/postgres -D /var/lib/pgsql/monitor -p 5432 -h *
postgres 14567 14566 0 18:04 ? 00:00:00 postgres: pg_auto_failover monitor: logger
postgres 14569 14566 0 18:04 ? 00:00:00 postgres: pg_auto_failover monitor: checkpointer
postgres 14570 14566 0 18:04 ? 00:00:00 postgres: pg_auto_failover monitor: background writer
postgres 14571 14566 0 18:04 ? 00:00:00 postgres: pg_auto_failover monitor: walwriter
postgres 14572 14566 0 18:04 ? 00:00:00 postgres: pg_auto_failover monitor: autovacuum launcher
postgres 14573 14566 0 18:04 ? 00:00:00 postgres: pg_auto_failover monitor: stats collector
postgres 14574 14566 0 18:04 ? 00:00:00 postgres: pg_auto_failover monitor: pg_auto_failover monitor
postgres 14575 14566 0 18:04 ? 00:00:00 postgres: pg_auto_failover monitor: logical replication launcher
postgres 14576 14566 0 18:04 ? 00:00:00 postgres: pg_auto_failover monitor: pg_auto_failover monitor worker
postgres 14577 14566 0 18:04 ? 00:00:00 postgres: pg_auto_failover monitor: pg_auto_failover monitor worker
postgres 14578 14566 0 18:04 ? 00:00:00 postgres: pg_auto_failover monitor: autoctl_node pg_auto_failover [local] idle
-bash-4.2$ psql
psql (12.6 (VMware Postgres 12.6.1))
Type "help" for help.
postgres=# \l
List of databases
Name | Owner | Encoding | Collate | Ctype | Access privileges
------------------+----------+----------+-------------+-------------+-----------------------
pg_auto_failover | autoctl | UTF8 | en_US.UTF-8 | en_US.UTF-8 |
postgres | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 |
template0 | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | =c/postgres +
| | | | | postgres=CTc/postgres
template1 | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | =c/postgres +
| | | | | postgres=CTc/postgres
(4 rows)
Now we are ready to install our Postgres primary and secondary nodes.
Primary host (autofailover-1)
1. Create the first Postgres instance. This will automatically be set by the monitor as the primary node with read-write capability (
PGDATA = /var/lib/pgpsql/ha):
-bash-4.2$ pg_autoctl create postgres --pgdata ha --auth trust --ssl-self-signed --username ha-admin --dbname appdb --hostname autofailover-1 --pgctl /bin/pg_ctl --monitor 'postgres://autoctl_node@autofailover-m/pg_auto_failover?sslmode=require'
18:21:09 14081 INFO Using default --ssl-mode "require"
18:21:09 14081 INFO Using --ssl-self-signed: pg_autoctl will create self-signed certificates, allowing for encrypted network traffic
18:21:09 14081 WARN Self-signed certificates provide protection against eavesdropping; this setup does NOT protect against Man-In-The-Middle attacks nor Impersonation attacks.
18:21:09 14081 WARN See https://www.postgresql.org/docs/current/libpq-ssl.html for details
18:21:09 14081 INFO Started pg_autoctl postgres service with pid 14083
18:21:09 14083 INFO /usr/bin/pg_autoctl do service postgres --pgdata ha -v
18:21:09 14081 INFO Started pg_autoctl node-init service with pid 14084
18:21:09 14084 INFO Registered node 1 (autofailover-1:5432) with name "node_1" in formation "default", group 0, state "single"
18:21:09 14084 INFO Writing keeper state file at "/var/lib/pgsql/.local/share/pg_autoctl/var/lib/pgsql/ha/pg_autoctl.state"
18:21:09 14084 INFO Writing keeper init state file at "/var/lib/pgsql/.local/share/pg_autoctl/var/lib/pgsql/ha/pg_autoctl.init"
18:21:09 14084 INFO Successfully registered as "single" to the monitor.
18:21:09 14084 INFO FSM transition from "init" to "single": Start as a single node
18:21:09 14084 INFO Initialising postgres as a primary
18:21:09 14084 INFO Initialising a PostgreSQL cluster at "ha"
18:21:09 14084 INFO /bin/pg_ctl initdb -s -D ha --option '--auth=trust'
18:21:09 14083 ERROR Failed to open file "/var/lib/pgsql/ha/postmaster.pid": No such file or directory
18:21:10 14084 INFO /bin/openssl req -new -x509 -days 365 -nodes -text -out /var/lib/pgsql/ha/server.crt -keyout /var/lib/pgsql/ha/server.key -subj "/CN=autofailover-1"
18:21:10 14107 INFO /bin/postgres -D /var/lib/pgsql/ha -p 5432 -h *
18:21:10 14083 INFO Postgres is now serving PGDATA "/var/lib/pgsql/ha" on port 5432 with pid 14107
18:21:10 14084 INFO CREATE DATABASE appdb;
18:21:10 14084 INFO CREATE EXTENSION pg_stat_statements;
18:21:10 14084 INFO Contents of "/var/lib/pgsql/ha/postgresql-auto-failover.conf" have changed, overwriting
18:21:10 14084 INFO Transition complete: current state is now "single"
18:21:10 14084 INFO keeper has been successfully initialized.
18:21:11 14081 WARN pg_autoctl service node-init exited with exit status 0
18:21:11 14083 INFO Postgres controller service received signal SIGTERM, terminating
18:21:11 14083 INFO Stopping pg_autoctl postgres service
18:21:11 14083 INFO /bin/pg_ctl --pgdata /var/lib/pgsql/ha --wait stop --mode fast
18:21:11 14081 INFO Stop pg_autoctl
2. As was the case with the monitor node, we need to add the service to
systemd to get it to start:
-bash-4.2$ pg_autoctl -q show systemd --pgdata ~postgres/ha > pgautofailover.service
sudo mv pgautofailover.service /etc/systemd/system
sudo systemctl daemon-reload
sudo systemctl enable pgautofailover
sudo systemctl start pgautofailover
3. Verify the state:
-bash-4.2$ pg_autoctl show state
Name | Node | Host:Port | LSN | Reachable | Current State | Assigned State
-------+-------+---------------------+-----------+-----------+---------------------+--------------------
node_1 | 1 | autofailover-1:5432 | 0/16DB650 | yes | single | single
Secondary Host (autofailover-2)
1. Create the secondary:
-bash-4.2$ pg_autoctl create postgres --pgdata ha --auth trust --ssl-self-signed --username ha-admin --dbname appdb --hostname autofailover-2 --pgctl /bin/pg_ctl --monitor 'postgres://autoctl_node@autofailover-m/pg_auto_failover?sslmode=require'
18:49:12 16573 INFO Using default --ssl-mode "require"
18:49:12 16573 INFO Using --ssl-self-signed: pg_autoctl will create self-signed certificates, allowing for encrypted network traffic
18:49:12 16573 WARN Self-signed certificates provide protection against eavesdropping; this setup does NOT protect against Man-In-The-Middle attacks nor Impersonation attacks.
18:49:12 16573 WARN See https://www.postgresql.org/docs/current/libpq-ssl.html for details
18:49:12 16573 INFO Started pg_autoctl postgres service with pid 16575
18:49:12 16575 INFO /usr/bin/pg_autoctl do service postgres --pgdata ha -v
18:49:12 16573 INFO Started pg_autoctl node-init service with pid 16576
18:49:12 16576 INFO Registered node 2 (autofailover-2:5432) with name "node_2" in formation "default", group 0, state "wait_standby"
18:49:12 16576 INFO Writing keeper state file at "/var/lib/pgsql/.local/share/pg_autoctl/var/lib/pgsql/ha/pg_autoctl.state"
18:49:12 16576 INFO Writing keeper init state file at "/var/lib/pgsql/.local/share/pg_autoctl/var/lib/pgsql/ha/pg_autoctl.init"
18:49:12 16576 INFO Successfully registered as "wait_standby" to the monitor.
18:49:12 16576 INFO FSM transition from "init" to "wait_standby": Start following a primary
18:49:12 16576 INFO Transition complete: current state is now "wait_standby"
18:49:12 16576 INFO New state for node 1 "node_1" (autofailover-1:5432): single ➜ wait_primary
18:49:12 16576 INFO New state for node 1 "node_1" (autofailover-1:5432): wait_primary ➜ wait_primary
18:49:12 16576 INFO Still waiting for the monitor to drive us to state "catchingup"
18:49:12 16576 WARN Please make sure that the primary node is currently running `pg_autoctl run` and contacting the monitor.
18:49:12 16576 INFO FSM transition from "wait_standby" to "catchingup": The primary is now ready to accept a standby
18:49:12 16576 INFO Initialising PostgreSQL as a hot standby
18:49:12 16576 INFO /bin/pg_basebackup -w -d application_name=pgautofailover_standby_2 host=autofailover-1 port=5432 user=pgautofailover_replicator sslmode=require --pgdata /var/lib/pgsql/backup/node_2 -U pgautofailover_replicator --verbose --progress --max-rate 100M --wal-method=stream --slot pgautofailover_standby_2
18:49:12 16576 INFO pg_basebackup: initiating base backup, waiting for checkpoint to complete
18:49:12 16576 INFO pg_basebackup: checkpoint completed
18:49:12 16576 INFO pg_basebackup: write-ahead log start point: 0/2000028 on timeline 1
18:49:12 16576 INFO pg_basebackup: starting background WAL receiver
18:49:12 16576 INFO 0/33531 kB (0%), 0/1 tablespace (...pgsql/backup/node_2/backup_label)
18:49:12 16576 INFO 33540/33540 kB (100%), 0/1 tablespace (.../backup/node_2/global/pg_control)
18:49:12 16576 INFO 33540/33540 kB (100%), 1/1 tablespace
18:49:12 16576 INFO pg_basebackup: write-ahead log end point: 0/2000138
18:49:12 16576 INFO pg_basebackup: waiting for background process to finish streaming ...
18:49:12 16576 INFO pg_basebackup: syncing data to disk ...
18:49:12 16576 INFO pg_basebackup: base backup completed
18:49:12 16576 INFO Creating the standby signal file at "/var/lib/pgsql/ha/standby.signal", and replication setup at "/var/lib/pgsql/ha/postgresql-auto-failover-standby.conf"
18:49:12 16576 INFO /bin/openssl req -new -x509 -days 365 -nodes -text -out /var/lib/pgsql/ha/server.crt -keyout /var/lib/pgsql/ha/server.key -subj "/CN=autofailover-2"
18:49:12 16584 INFO /bin/postgres -D /var/lib/pgsql/ha -p 5432 -h *
18:49:13 16576 INFO PostgreSQL started on port 5432
18:49:13 16576 INFO Fetched current list of 1 other nodes from the monitor to update HBA rules, including 1 changes.
18:49:13 16576 INFO Ensuring HBA rules for node 1 (autofailover-1:5432)
18:49:13 16576 INFO Transition complete: current state is now "catchingup"
18:49:13 16576 INFO keeper has been successfully initialized.
18:49:13 16573 WARN pg_autoctl service node-init exited with exit status 0
18:49:13 16575 INFO Postgres is now serving PGDATA "/var/lib/pgsql/ha" on port 5432 with pid 16584
18:49:13 16575 INFO Postgres controller service received signal SIGTERM, terminating
18:49:13 16575 INFO Stopping pg_autoctl postgres service
18:49:13 16575 INFO /bin/pg_ctl --pgdata /var/lib/pgsql/ha --wait stop --mode fast
18:49:13 16573 INFO Waiting for subprocesses to terminate.
18:49:13 16573 INFO Stop pg_autoctl
You can see
pg_auto_failover knows to create this as a secondary and leverages
pg_basebackup.
2. Add to
systemd:
pg_autoctl -q show systemd --pgdata ~postgres/ha > pgautofailover.service
sudo mv pgautofailover.service /etc/systemd/system
sudo systemctl daemon-reload
sudo systemctl enable pgautofailover
sudo systemctl start pgautofailover
3. Check the state:
-bash-4.2$ pg_autoctl show state
Name | Node | Host:Port | LSN | Reachable | Current State | Assigned State
-------+-------+---------------------+-----------+-----------+---------------------+--------------------
node_1 | 1 | autofailover-1:5432 | 0/3015058 | yes | primary | primary
node_2 | 2 | autofailover-2:5432 | 0/3015058 | yes | secondary | secondary
The 2-node cluster setup is complete. If you want to scale this cluster to add additional nodes, just follow the same procedure for adding a secondary. There is
no limit to the number of secondary nodes that can be added into a single group.
Test to make sure secondary is read-only
pgautofailover-2:
-bash-4.2$ psql
psql (12.6 (VMware Postgres 12.6.1))
Type "help" for help.
postgres=# create table kevin (id int);
ERROR: cannot execute CREATE TABLE in a read-only transaction
pgautofailover-1:
-bash-4.2$ psql
psql (12.6 (VMware Postgres 12.6.1))
Type "help" for help.
postgres=# create table kevin (id int);
CREATE TABLE
Graceful failover test
-bash-4.2$ pg_autoctl perform failover
20:00:02 27286 INFO Listening monitor notifications about state changes in formation "default" and group 0
20:00:02 27286 INFO Following table displays times when notifications are received
Time | Name | Node | Host:Port | Current State | Assigned State
---------+--------+-------+---------------------+---------------------+--------------------
20:00:02 | node_1 | 1 | autofailover-1:5432 | primary | draining
20:00:02 | node_2 | 2 | autofailover-2:5432 | secondary | prepare_promotion
20:00:02 | node_2 | 2 | autofailover-2:5432 | prepare_promotion | prepare_promotion
20:00:02 | node_2 | 2 | autofailover-2:5432 | prepare_promotion | stop_replication
20:00:02 | node_1 | 1 | autofailover-1:5432 | primary | demote_timeout
20:00:02 | node_1 | 1 | autofailover-1:5432 | draining | demote_timeout
20:00:02 | node_1 | 1 | autofailover-1:5432 | demote_timeout | demote_timeout
20:00:03 | node_2 | 2 | autofailover-2:5432 | stop_replication | stop_replication
20:00:03 | node_2 | 2 | autofailover-2:5432 | stop_replication | wait_primary
20:00:03 | node_1 | 1 | autofailover-1:5432 | demote_timeout | demoted
20:00:03 | node_1 | 1 | autofailover-1:5432 | demoted | demoted
20:00:03 | node_2 | 2 | autofailover-2:5432 | wait_primary | wait_primary
20:00:04 | node_1 | 1 | autofailover-1:5432 | demoted | catchingup
20:00:05 | node_1 | 1 | autofailover-1:5432 | catchingup | catchingup
20:00:05 | node_1 | 1 | autofailover-1:5432 | catchingup | secondary
20:00:05 | node_2 | 2 | autofailover-2:5432 | wait_primary | primary
20:00:05 | node_1 | 1 | autofailover-1:5432 | secondary | secondary
20:00:06 | node_2 | 2 | autofailover-2:5432 | primary | primary
-bash-4.2$ pg_autoctl show state
Name | Node | Host:Port | LSN | Reachable | Current State | Assigned State
-------+-------+---------------------+-----------+-----------+---------------------+--------------------
node_1 | 1 | autofailover-1:5432 | 0/50000D8 | yes | secondary | secondary
node_2 | 2 | autofailover-2:5432 | 0/50000D8 | yes | primary | primary
Note: By default, VMware Postgres will set
PGDATA as
/var/lib/pgpsql/data. If you use a different PGDATA when setting up Postgres,
pg_autoctl will give you this error message when getting state information:
-bash-4.2$ pg_autoctl show state
18:36:43 18482 ERROR Failed to open file "/var/lib/pgsql/.config/pg_autoctl/var/lib/pgsql/data/pg_autoctl.cfg": No such file or directory
18:36:43 18482 FATAL Failed to parse configuration file "/var/lib/pgsql/.config/pg_autoctl/var/lib/pgsql/data/pg_autoctl.cfg"
Set
PGDATA in
/var/lib/pgpsql/.bash_profile to the correct location.