Beginning with Postgres 11.5, Patroni High Availability (HA) solution is being shipped out as a part of the Postgres binary.
The documentation covers the pre-requisites necessary to use Patroni, however it refers to the official Patroni documentation for setup and configuring.
This article covers how to set up a 3-node Patroni cluster using etcd
.
This configuration will provide a HA Postgres cluster in cases where a host is unavailable. Before we go through the setup, we should review a couple of concepts:
For those that have used Tanzu Greenplum 6.x, it is essentially the same as how Write Ahead Log (WAL) replication is done on the primary and mirror segments.
For Postgres and other databases, it was defined traditionally as master/slave replication. However, due to the negative connotation of those terms, it has been rebranded as primary-standby.
In the primary Postgres instance, once any change is made (DML operation), the change is first written to an in-memory WAL buffer. They are then immediately written to a WAL segment file when a transaction commits or aborts and in the primary's pg_wal
directory
(pg_xlog
in Postgres 9.x or lower).
After the WAL file is created, the primary node's walsender
process will send the file to the standby's walreceiver process. The standby's walreceiver
process will write the WAL data into a WAL segment file on the standby's pg_wal
directory and will inform the startup process. The startup process will then replay the WAL data, essentially like a recovery process and the data will be in sync.
postgres 12391 12347 0 17:07 ? 00:00:00 postgres: patroni_test: walsender replicator 10.193.102.204(29906) streaming 0/4132360 postgres 14013 12347 0 19:05 ? 00:00:00 postgres: patroni_test: walsender replicator 10.193.102.205(33855) streaming 0/4132360
Standby
postgres 11916 11914 0 17:07 ? 00:00:00 postgres: patroni_test: startup recovering 0000000A0000000000000004 postgres 11928 11914 0 17:07 ? 00:00:09 postgres: patroni_test: walreceiver streaming 0/4132360
Patroni is a tool that can be used to create, manage, maintain, and monitor highly available Postgres clusters using streaming replication. Patroni can handle the initial Postgres database initialization as well as planned switchovers or unplanned failover.
In our example of a 3-node cluster, you will see how each of the 3 Postgres instances interact with each other. In general, each Patroni instance will manage its own Postgres instance and collect health data for it. Based on this data, each Patroni instance decides if any action needs to be taken to keep the cluster as a whole healthy. The challenge with Patroni is making sure that only a single Patroni instance is able to acquire and keep the "leader lock
" (primary). This is where etcd
comes in.
etcd
is a Distributed Consensus Store (DCS) and is needed to provide a distributed decision-making mechanism to allow Patroni to determine which instance should be the leader and which should be replicas.
In our example, we will be creating a 3-node etcd
cluster. Zookeeper can also be used but will not be covered in this article. The number of etcd
nodes is not dependent on the number of Patroni nodes in the cluster. However, you should have more than one to prevent a single point of failure and it is encouraged to have at least 3 nodes to allow for there to be a majority during leader voting should a failure occur.
The following example is for CentOS 7.
1. As root user, install Postgres (this will create the Postgres user).
# yum install pivotal-postgres-11.6-1.el7.x86_64.rpm
Do not run initdb
and startup Postgres after this step. This will be handled later by Patroni. There is no need to install Patroni separately as it is already included with Postgres.
2. Install etcd
and Patroni pre-requisites
# yum install -y python3 python3-devel gcc # yum install -y etcd # yum install -y python-psycopg2
# cat << EOF > requirements.txt urllib3>=1.19.1,!=1.21 boto PyYAML requests six >= 1.7 kazoo>=1.3.1 python-etcd>=0.4.3,<0.5 python-consul>=0.7.0 click>=4.1 prettytable>=0.7 tzlocal python-dateutil psutil>=2.0.0 cdiff kubernetes>=2.0.0,<=7.0.0,!=4.0.*,!=5.0.* EOF # pip3 install -r requirements.txt
Note: pip3 install -r requirements.txt may not work in later versions of Patroni. Instead, you will need to install the dependencies like this:
sudo pip3 install psycopg2>=2.5.4 sudo pip3 install patroni[dependencies] sudo pip3 install patroni[etcd]
3. Change the permission settings on the /tmp directory:
# chmod 777 /tmp
With the prerequisites now in place on all hosts, we can setup our 3-node etcd
cluster.
Note: The rest of the setup will be done as Postgres user:
1. Create the etcd.yml
file. We will use this to start the etcd
cluster.
$ sudo su - postgres $ vi etcd.yml
name: '<hostname1>' listen-peer-urls: 'http://10.193.102.203:2380' listen-client-urls: 'http://10.193.102.203:2379, http://127.0.0.1:2379' initial-advertise-peer-urls: 'http://10.193.102.203:2380' advertise-client-urls: 'http://10.193.102.203:2379' initial-cluster: '<hostname1>=http://10.193.102.203:2380,<hostname2>=http://10.193.102.204:2380,<hostname3>=http://10.193.102.205:2380' initial-cluster-state: 'new' initial-cluster-token: 'etcd-cluster-1'
2. Save the file and startup etcd
:
etcd --config-file /var/lib/pgsql/etcd.yml > etcd_logfile 2>&1 &
The background process should be running. If it terminates, then check etcd_logfile
for any errors. There's likely an issue with the configuration or perhaps the 2379
and/or 2380
are already in use.
Do this for all hosts. The YAML file should looks like the ones below:
name: '<hostname2>' listen-peer-urls: 'http://10.193.102.204:2380' listen-client-urls: 'http://10.193.102.204:2379, http://127.0.0.1:2379' initial-advertise-peer-urls: 'http://10.193.102.204:2380' advertise-client-urls: 'http://10.193.102.204:2379' initial-cluster: '<hostname1>=http://10.193.102.203:2380,<hostname2>=http://10.193.102.204:2380,<hostname3>=http://10.193.102.205:2380' initial-cluster-state: 'new' initial-cluster-token: 'etcd-cluster-1'
name: '<hostname3>' listen-peer-urls: 'http://10.193.102.205:2380' listen-client-urls: 'http://10.193.102.205:2379, http://127.0.0.1:2379' initial-advertise-peer-urls: 'http://10.193.102.205:2380' advertise-client-urls: 'http://10.193.102.205:2379' initial-cluster: '<hostname1>=http://10.193.102.203:2380,<hostname2>=http://10.193.102.204:2380,<hostname3>=http://10.193.102.205:2380' initial-cluster-state: 'new' initial-cluster-token: 'etcd-cluster-1'
You can verify if the setup is correct by running the following commands:
-bash-4.2$ etcdctl cluster-health member 3172be2b0510829 is healthy: got healthy result from http://10.193.102.203:2379 member bf2f62dbbe5028b6 is healthy: got healthy result from http://10.193.102.204:2379 member e03eb2acfb7726f9 is healthy: got healthy result from http://10.193.102.205:2379 cluster is healthy
You can check the current leader with the following command:
-bash-4.2$ etcdctl member list 3172be2b0510829: name=postgres1 peerURLs=http://10.193.102.203:2380 clientURLs=http://10.193.102.203:2379 isLeader=false bf2f62dbbe5028b6: name=postgres2 peerURLs=http://10.193.102.204:2380 clientURLs=http://10.193.102.204:2379 isLeader=true e03eb2acfb7726f9: name=postgres3 peerURLs=http://10.193.102.205:2380 clientURLs=http://10.193.102.205:2379 isLeader=false
If you need to add or remove an etcd
member later, you can use the "etcdctl member
" command:
-bash-4.2$ etcdctl member NAME: etcdctl member - member add, remove and list subcommands USAGE: etcdctl member command [command options] [arguments...] COMMANDS: list enumerate existing cluster members add add a new member to the etcd cluster remove remove an existing member from the etcd cluster update update an existing member in the etcd cluster
Note: The above commands for etcdctl are the default commands. More functionality is available if you set the environment variable:
export ETCDCTL_API=3
The method that we used for setting up etcd
is called static bootstrapping. This requires knowledge of the IP address for all the cluster members and the addresses should be static. The other method is "discovery bootstrap
" - it is only relevant when these etcd
nodes are part of a container such as Kubernetes, which has it's own special network setup and will not be covered in this article.
Each host will also have it's own YAML file. In this example, I will provide the minimal settings for getting up and running. If you want to be adventurous and try out different settings, you can check out all the options in the Patroni documentation: YAML configuration
Host 1 (10.193.102.203)
$ vi postgres0.yml scope: patroni_test name: postgresql0 restapi: listen: 10.193.102.203:8008 connect_address: 10.193.102.203:8008 etcd: #Provide host to do the initial discovery of the cluster topology: hosts: 10.193.102.203:2379,10.193.102.204:2379,10.193.102.205:2379 bootstrap: # this section will be written into Etcd:/<namespace>/<scope>/config after initializing new cluster # and all other cluster members will use it as a `global configuration` dcs: ttl: 30 loop_wait: 10 retry_timeout: 10 maximum_lag_on_failover: 1048576 postgresql: use_pg_rewind: true use_slots: true parameters: hot_standby: "on" wal_keep_segments: 20 max_wal_senders: 8 max_replication_slots: 8 # some desired options for 'initdb' initdb: # Note: It needs to be a list (some options need values, others are switches) - encoding: UTF8 - data-checksums pg_hba: # Add following lines to pg_hba.conf after running 'initdb' - host replication replicator 10.193.102.203/32 md5 - host replication replicator 10.193.102.204/32 md5 - host replication replicator 10.193.102.205/32 md5 - host all all 0.0.0.0/0 md5 # Some additional users users which needs to be created after initializing new cluster users: admin: password: admin options: - createrole - createdb postgresql: listen: 10.193.102.203:5432 connect_address: 10.193.102.203:5432 data_dir: /var/lib/pgsql/data pgpass: /tmp/pgpass0 authentication: replication: username: replicator password: rep-pass superuser: username: postgres password: postgres rewind: # Has no effect on postgres 10 and lower username: rewind_user password: rewind_password parameters: unix_socket_directories: '.' tags: nofailover: false noloadbalance: false clonefrom: false nosync: false
Start Patroni:
patroni postgres0.yml > patroni_member_1.log 2>&1 &
This will start a Patroni process and will initialize the Postgres database on the host. Again, if the process terminates, check out the patroni_member_1.log
and see if there are any errors. A successful message should look similar to the following:
2020-03-17 23:32:58.648 UTC [10491] LOG: database system is ready to accept connections 10.193.102.222:5432 - accepting connections 10.193.102.222:5432 - accepting connections 2020-03-17 23:32:59,608 INFO: establishing a new patroni connection to the postgres cluster 2020-03-17 23:32:59,631 INFO: running post_bootstrap 2020-03-17 23:32:59,651 WARNING: Could not activate Linux watchdog device: "Can't open watchdog device: [Errno 2] No such file or directory: '/dev/watchdog'" 2020-03-17 23:32:59,665 INFO: initialized a new cluster 2020-03-17 23:33:09,645 INFO: Lock owner: postgresql0; I am postgresql0 2020-03-17 23:33:09,668 INFO: Lock owner: postgresql0; I am postgresql0 2020-03-17 23:33:09,684 INFO: no action. i am the leader with the lock
You should now be able to log in to your new database:
-bash-4.2$ psql -h segment1 -U postgres Password for user postgres: psql (11.6 (Pivotal Postgres 11.6.1)) Type "help" for help. postgres=#
Note: Do the same for the other two hosts, but make sure to update the following sections to match the IP address of the host you are on. The name should be changed as well:
name: postgresql1 restapi: listen: 10.193.102.203:8008 connect_address: 10.193.102.203:8008 postgresql: listen: 10.193.102.203:5432 connect_address: 10.193.102.203:5432
Once Patroni is started on all 3 hosts, you should be able to use Patronictl
to check on cluster state:
-bash-4.2$ patronictl -c postgres0.yml list +--------------+-------------+----------------+--------+---------+----+-----------+ | Cluster | Member | Host | Role | State | TL | Lag in MB | +--------------+-------------+----------------+--------+---------+----+-----------+ | patroni_test | postgresql0 | 10.193.102.203 | Leader | running | 12 | 0 | | patroni_test | postgresql1 | 10.193.102.204 | | running | 12 | 0 | | patroni_test | postgresql2 | 10.193.102.205 | | running | 12 | 0 | +--------------+-------------+----------------+--------+---------+----+-----------+
The cluster is now all set up. You can try the following things to make sure things work:
patronictl -c postgres0.yml switchover
" command. You'll find that you can also schedule these switchovers ahead of time.