How to setup a 3 node Patroni cluster using etcd in Postgres
search cancel

How to setup a 3 node Patroni cluster using etcd in Postgres

book

Article ID: 296385

calendar_today

Updated On:

Products

VMware Tanzu PostgreSQL

Issue/Introduction

Beginning with Postgres 11.5, Patroni High Availability (HA) solution is being shipped out as a part of the Postgres binary.

The documentation covers the pre-requisites necessary to use Patroni, however it refers to the official Patroni documentation for setup and configuring.

This article covers how to set up a 3-node Patroni cluster using etcd.

This configuration will provide a HA Postgres cluster in cases where a host is unavailable. Before we go through the setup, we should review a couple of concepts:


Streaming replication in Postgres

For those that have used Tanzu Greenplum 6.x, it is essentially the same as how Write Ahead Log (WAL) replication is done on the primary and mirror segments.

For Postgres and other databases, it was defined traditionally as master/slave replication. However, due to the negative connotation of those terms, it has been rebranded as primary-standby.

In the primary Postgres instance, once any change is made (DML operation), the change is first written to an in-memory WAL buffer. They are then immediately written to a WAL segment file when a transaction commits or aborts and in the primary's pg_wal directory (pg_xlog in Postgres 9.x or lower).

After the WAL file is created, the primary node's walsender process will send the file to the standby's walreceiver process. The standby's walreceiver process will write the WAL data into a WAL segment file on the standby's pg_wal directory and will inform the startup process. The startup process will then replay the WAL data, essentially like a recovery process and the data will be in sync.


Primary

postgres 12391 12347  0 17:07 ?        00:00:00 postgres: patroni_test: walsender replicator 10.193.102.204(29906) streaming 0/4132360
postgres 14013 12347  0 19:05 ?        00:00:00 postgres: patroni_test: walsender replicator 10.193.102.205(33855) streaming 0/4132360


Standby

postgres 11916 11914  0 17:07 ?        00:00:00 postgres: patroni_test: startup   recovering 0000000A0000000000000004
postgres 11928 11914  0 17:07 ?        00:00:09 postgres: patroni_test: walreceiver   streaming 0/4132360


What is Patroni?

Patroni is a tool that can be used to create, manage, maintain, and monitor highly available Postgres clusters using streaming replication. Patroni can handle the initial Postgres database initialization as well as planned switchovers or unplanned failover.

In our example of a 3-node cluster, you will see how each of the 3 Postgres instances interact with each other. In general, each Patroni instance will manage its own Postgres instance and collect health data for it. Based on this data, each Patroni instance decides if any action needs to be taken to keep the cluster as a whole healthy. The challenge with Patroni is making sure that only a single Patroni instance is able to acquire and keep the "leader lock" (primary). This is where etcd comes in.


Why is etcd needed?

etcd is a Distributed Consensus Store (DCS) and is needed to provide a distributed decision-making mechanism to allow Patroni to determine which instance should be the leader and which should be replicas.

In our example, we will be creating a 3-node etcd cluster. Zookeeper can also be used but will not be covered in this article. The number of etcd nodes is not dependent on the number of Patroni nodes in the cluster. However, you should have more than one to prevent a single point of failure and it is encouraged to have at least 3 nodes to allow for there to be a majority during leader voting should a failure occur.

Environment

Product Version: 11.6

Resolution

The following example is for CentOS 7.


On all 3 hosts

1. As root user, install Postgres (this will create the Postgres user).

# yum install pivotal-postgres-11.6-1.el7.x86_64.rpm

Do not run initdb and startup Postgres after this step. This will be handled later by Patroni. There is no need to install Patroni separately as it is already included with Postgres.  

2. Install etcd and Patroni pre-requisites

# yum install -y python3 python3-devel gcc
# yum install -y etcd
# yum install -y python-psycopg2 
# cat << EOF > requirements.txt
urllib3>=1.19.1,!=1.21
boto
PyYAML
requests
six >= 1.7
kazoo>=1.3.1
python-etcd>=0.4.3,<0.5
python-consul>=0.7.0
click>=4.1
prettytable>=0.7
tzlocal
python-dateutil
psutil>=2.0.0
cdiff
kubernetes>=2.0.0,<=7.0.0,!=4.0.*,!=5.0.*
EOF

# pip3 install -r requirements.txt


Note: pip3 install -r requirements.txt may not work in later versions of Patroni. Instead, you will need to install the dependencies like this:

sudo pip3 install psycopg2>=2.5.4
sudo pip3 install patroni[dependencies]
sudo pip3 install patroni[etcd]

3. Change the permission settings on the /tmp directory:

# chmod 777 /tmp


With the prerequisites now in place on all hosts, we can setup our 3-node etcd cluster. 


On Host 1 (10.193.102.203)

Note: The rest of the setup will be done as Postgres user:

1. Create the etcd.yml file. We will use this to start the etcd cluster.

$ sudo su - postgres
$ vi etcd.yml
name: '<hostname1>'
listen-peer-urls: 'http://10.193.102.203:2380'
listen-client-urls: 'http://10.193.102.203:2379, http://127.0.0.1:2379'
initial-advertise-peer-urls: 'http://10.193.102.203:2380'
advertise-client-urls: 'http://10.193.102.203:2379'

initial-cluster: '<hostname1>=http://10.193.102.203:2380,<hostname2>=http://10.193.102.204:2380,<hostname3>=http://10.193.102.205:2380'
initial-cluster-state: 'new'
initial-cluster-token: 'etcd-cluster-1'


2. Save the file and startup etcd:

etcd --config-file /var/lib/pgsql/etcd.yml > etcd_logfile 2>&1 &

The background process should be running. If it terminates, then check etcd_logfile for any errors. There's likely an issue with the configuration or perhaps the 2379 and/or 2380 are already in use.

Do this for all hosts. The YAML file should looks like the ones below:

  • Host 2 (10.193.102.204)

name: '<hostname2>'
listen-peer-urls: 'http://10.193.102.204:2380'
listen-client-urls: 'http://10.193.102.204:2379, http://127.0.0.1:2379'
initial-advertise-peer-urls: 'http://10.193.102.204:2380'
advertise-client-urls: 'http://10.193.102.204:2379'

initial-cluster: '<hostname1>=http://10.193.102.203:2380,<hostname2>=http://10.193.102.204:2380,<hostname3>=http://10.193.102.205:2380' initial-cluster-state: 'new' initial-cluster-token: 'etcd-cluster-1'
  • Host 3 (10.193.102.205):

name: '<hostname3>'
listen-peer-urls: 'http://10.193.102.205:2380'
listen-client-urls: 'http://10.193.102.205:2379, http://127.0.0.1:2379'
initial-advertise-peer-urls: 'http://10.193.102.205:2380'
advertise-client-urls: 'http://10.193.102.205:2379'

initial-cluster: '<hostname1>=http://10.193.102.203:2380,<hostname2>=http://10.193.102.204:2380,<hostname3>=http://10.193.102.205:2380'
initial-cluster-state: 'new'
initial-cluster-token: 'etcd-cluster-1'


You can verify if the setup is correct by running the following commands:

-bash-4.2$ etcdctl cluster-health
member 3172be2b0510829 is healthy: got healthy result from http://10.193.102.203:2379
member bf2f62dbbe5028b6 is healthy: got healthy result from http://10.193.102.204:2379
member e03eb2acfb7726f9 is healthy: got healthy result from http://10.193.102.205:2379
cluster is healthy

You can check the current leader with the following command:

-bash-4.2$ etcdctl member list
3172be2b0510829: name=postgres1 peerURLs=http://10.193.102.203:2380 clientURLs=http://10.193.102.203:2379 isLeader=false
bf2f62dbbe5028b6: name=postgres2 peerURLs=http://10.193.102.204:2380 clientURLs=http://10.193.102.204:2379 isLeader=true
e03eb2acfb7726f9: name=postgres3 peerURLs=http://10.193.102.205:2380 clientURLs=http://10.193.102.205:2379 isLeader=false


If you need to add or remove an etcd member later, you can use the "etcdctl member" command:

-bash-4.2$ etcdctl member
NAME:
   etcdctl member - member add, remove and list subcommands

USAGE:
   etcdctl member command [command options] [arguments...]

COMMANDS:
     list    enumerate existing cluster members
     add     add a new member to the etcd cluster
     remove  remove an existing member from the etcd cluster
     update  update an existing member in the etcd cluster


Note: The above commands for etcdctl are the default commands. More functionality is available if you set the environment variable:

export ETCDCTL_API=3

The method that we used for setting up etcd is called static bootstrapping. This requires knowledge of the IP address for all the cluster members and the addresses should be static. The other method is "discovery bootstrap" - it is only relevant when these etcd nodes are part of a container such as Kubernetes, which has it's own special network setup and will not be covered in this article. 


Patroni setup

Each host will also have it's own YAML file. In this example, I will provide the minimal settings for getting up and running. If you want to be adventurous and try out different settings, you can check out all the options in the Patroni documentation: YAML configuration 

Host 1 (10.193.102.203)

$ vi postgres0.yml
scope: patroni_test
name: postgresql0

restapi:
  listen: 10.193.102.203:8008
  connect_address: 10.193.102.203:8008

etcd:
  #Provide host to do the initial discovery of the cluster topology:
  hosts: 10.193.102.203:2379,10.193.102.204:2379,10.193.102.205:2379

bootstrap:
  # this section will be written into Etcd:/<namespace>/<scope>/config after initializing new cluster
  # and all other cluster members will use it as a `global configuration`
  dcs:
    ttl: 30
    loop_wait: 10
    retry_timeout: 10
    maximum_lag_on_failover: 1048576
    postgresql:
      use_pg_rewind: true
      use_slots: true
      parameters:
        hot_standby: "on"
        wal_keep_segments: 20
        max_wal_senders: 8
        max_replication_slots: 8

  # some desired options for 'initdb'
  initdb:  # Note: It needs to be a list (some options need values, others are switches)
  - encoding: UTF8
  - data-checksums

  pg_hba:  # Add following lines to pg_hba.conf after running 'initdb'
  - host replication replicator 10.193.102.203/32 md5
  - host replication replicator 10.193.102.204/32 md5
  - host replication replicator 10.193.102.205/32 md5
  - host all all 0.0.0.0/0 md5

  # Some additional users users which needs to be created after initializing new cluster
  users:
    admin:
      password: admin
      options:
        - createrole
        - createdb

postgresql:
  listen: 10.193.102.203:5432
  connect_address: 10.193.102.203:5432
  data_dir: /var/lib/pgsql/data
  pgpass: /tmp/pgpass0
  authentication:
    replication:
      username: replicator
      password: rep-pass
    superuser:
      username: postgres
      password: postgres
    rewind:  # Has no effect on postgres 10 and lower
      username: rewind_user
      password: rewind_password
  parameters:
    unix_socket_directories: '.'

tags:
    nofailover: false
    noloadbalance: false
    clonefrom: false
    nosync: false


Start Patroni:

patroni postgres0.yml > patroni_member_1.log 2>&1 &


This will start a Patroni process and will initialize the Postgres database on the host. Again, if the process terminates, check out the patroni_member_1.log and see if there are any errors. A successful message should look similar to the following:

2020-03-17 23:32:58.648 UTC [10491] LOG:  database system is ready to accept connections
10.193.102.222:5432 - accepting connections
10.193.102.222:5432 - accepting connections
2020-03-17 23:32:59,608 INFO: establishing a new patroni connection to the postgres cluster
2020-03-17 23:32:59,631 INFO: running post_bootstrap
2020-03-17 23:32:59,651 WARNING: Could not activate Linux watchdog device: "Can't open watchdog device: [Errno 2] No such file or directory: '/dev/watchdog'"
2020-03-17 23:32:59,665 INFO: initialized a new cluster
2020-03-17 23:33:09,645 INFO: Lock owner: postgresql0; I am postgresql0
2020-03-17 23:33:09,668 INFO: Lock owner: postgresql0; I am postgresql0
2020-03-17 23:33:09,684 INFO: no action.  i am the leader with the lock


You should now be able to log in to your new database:

-bash-4.2$ psql -h segment1 -U postgres
Password for user postgres: 
psql (11.6 (Pivotal Postgres 11.6.1))
Type "help" for help.

postgres=#


Note: Do the same for the other two hosts, but make sure to update the following sections to match the IP address of the host you are on. The name should be changed as well:

name: postgresql1

restapi:
  listen: 10.193.102.203:8008
  connect_address: 10.193.102.203:8008

postgresql:
  listen: 10.193.102.203:5432
  connect_address: 10.193.102.203:5432


Once Patroni is started on all 3 hosts, you should be able to use Patronictl to check on cluster state:

-bash-4.2$ patronictl -c postgres0.yml list
+--------------+-------------+----------------+--------+---------+----+-----------+
|   Cluster    |    Member   |      Host      |  Role  |  State  | TL | Lag in MB |
+--------------+-------------+----------------+--------+---------+----+-----------+
| patroni_test | postgresql0 | 10.193.102.203 | Leader | running | 12 |         0 |
| patroni_test | postgresql1 | 10.193.102.204 |        | running | 12 |         0 |
| patroni_test | postgresql2 | 10.193.102.205 |        | running | 12 |         0 |
+--------------+-------------+----------------+--------+---------+----+-----------+


The cluster is now all set up. You can try the following things to make sure things work:

  • Create a table and insert data into the primary. Then check the standby and see if the data is replicated there.
  • Perform a manual switchover using "patronictl -c postgres0.yml switchover" command. You'll find that you can also schedule these switchovers ahead of time.
  • Simulate an unplanned failover by rebooting any one of the hosts.