This guide will walk you through the process of testing failover using repmgr with PostgreSQL High Availability (HA) configurations. The procedure involves pausing replication, promoting a standby instance to primary, performing data loading tests, and then either reverting or recreating the standby instance depending on the outcome of the tests.
You have a primary PostgreSQL instance and a standby PostgreSQL instance already set up with repmgr configured for HA.
You have admin access to PostgreSQL (psql
) on both the primary and standby instances.
The PostgreSQL environment is configured with wal_log_hints
enabled or data checksums enabled on the primary to support pg_rewind.
The primary instance will be set to read-only mode during this process. This ensures that any changes made to the primary after the promotion of the standby instance won’t conflict when the standby is reverted back later.
1.1. On the primary instance, log into psql
as an admin user:
This command makes the primary instance read-only. Any new transactions on the primary will be blocked.
1.2. After making this change, the configuration is reloaded automatically.
To simulate a failover, the standby instance must be promoted to become the primary. This will allow you to perform your tests without shutting down the current primary.
2.1. On the standby instance host, execute the following command to promote the standby:
Important Caution: This promotion will result in both the original primary and the promoted standby becoming active primary instances. This is expected since we are not shutting down the original primary instance during testing.
Note: We cannot use the repmgr standby promote
command here because it requires the original primary to be down. Instead, pg_ctl promote
directly promotes the standby instance.
3. Perform Data Loading Tests on the Promoted Standby
With the standby promoted to primary, you can now proceed with your data loading tests.
3.1. On the promoted standby instance, perform the necessary data loading and testing steps (e.g., inserting data, running queries, etc.).
After completing the tests, you may want to revert the promoted standby back to a standby role. This involves using pg_rewind
to sync the data with the primary.
pg_rewind
(Recommended)To revert the promoted standby, you'll use pg_rewind
. However, this requires that either wal_log_hints or data checksums be enabled on the primary instance. This is because pg_rewind
relies on these features to work correctly.
4a.1. On the standby instance host, stop the promoted standby instance:
4a.2. Rejoin the standby to the primary and perform the rewind operation:
This command will sync the standby with the primary instance.
Once the promoted standby is reverted, you can return the primary instance to its normal writable state.
5.1. On the primary instance, log into psql
and execute the following:
This will make the primary instance writable again.
pg_rewind
Fails, Recreate the StandbyIn case pg_rewind
fails due to missing WAL segments or other issues (such as missing data checksums or wal_log_hints
being off), you’ll need to recreate the standby instance completely.
6.1. On the primary instance, log into psql
and run the following to ensure the primary is write-enabled:
6.2. On the standby instance host, stop the standby:
6.3. Clone the standby from the primary:
6.4. Start the newly cloned standby:
6.5. Register the new standby with repmgr:
Once all the above steps are completed, verify the status of the cluster to ensure everything is running normally.
7.1. On any node (primary or standby), run:
This will show you the current state of the cluster and confirm that everything is functioning as expected.
VMware Postgres 15.6.0
By following these steps, you can simulate a failover in your PostgreSQL cluster, test your failover mechanisms, and revert back to the original configuration if necessary. Whether you use pg_rewind
to restore the original standby or recreate the standby from scratch, this procedure ensures that your PostgreSQL high availability setup is resilient and functional.
Always make sure to take proper backups before performing any failover or reconfiguration steps in a production environment.