As of
pg_auto_failover 1.4.0,
pg_auto_failover supports having multiple secondary (replica) nodes. However, when there are multiple secondary nodes that experience failover, the second secondary node gets stuck in a draining state.
Let's assume that we have the following initial state:
Name | Node | Host:Port | LSN | Reachable | Current State | Assigned State
--------+-------+---------------------+------------+-----------+---------------------+--------------------
node_10 | 10 | autofailover-3:5433 | 0/3D000000 | yes | secondary | secondary
node_1 | 11 | autofailover-1:5432 | 0/3F000110 | yes | primary | primary
node_2 | 14 | autofailover-2:5432 | 0/3F000110 | yes | secondary | secondary
node_3 | 15 | autofailover-3:5432 | 0/3D000000 | yes | secondary | secondary
If we bring down
node_1 (Node #11),
pg_auto_failover will correctly failover. In this case, it failed over to
node_10 (node #10). Shutting down
node_10 will result in
node_3 (Node #13) becoming the primary.
However, once we shut down
node_3 (Node #13), we see that the state gets stuck in a draining state:
Name | Node | Host:Port | LSN | Reachable | Current State | Assigned State
--------+-------+---------------------+------------+-----------+---------------------+--------------------
node_10 | 10 | autofailover-3:5433 | 0/36008BB8 | no | primary | demoted
node_1 | 11 | autofailover-1:5432 | 0/37000350 | no | primary | demoted
node_3 | 13 | autofailover-3:5432 | 0/37002BF0 | no | primary | draining
node_2 | 14 | autofailover-2:5432 | 0/37002DB8 | yes | wait_primary | wait_primary
If you attempt to bring any of the other nodes back up, it will fail. The logs will show that the monitor is not looking for the correct node when updating the states:
Node_3 (Node #13) Log
Apr 30 18:30:00 autofailover-3 sudo[28319]: postgres : TTY=pts/1 ; PWD=/var/lib/pgsql ; USER=root ; COMMAND=/bin/systemctl stop pgautofailover
Apr 30 18:30:00 autofailover-3 polkitd[641]: Registered Authentication Agent for unix-process:28320:859510 (system bus name :1.66 [/usr/bin/pkttyagent --notify-fd 5 --fallback], object path /org/freedesktop/PolicyKit1/AuthenticationAgent, locale en_US.UTF-8)
Apr 30 18:30:00 autofailover-3 systemd[1]: Stopping pg_auto_failover...
Apr 30 18:30:00 autofailover-3 pg_autoctl[25802]: 18:30:00 25802 INFO pg_autoctl received signal SIGTERM, terminating
Apr 30 18:30:00 autofailover-3 pg_autoctl[25802]: 18:30:00 25805 INFO Postgres controller service received signal SIGTERM, terminating
Apr 30 18:30:00 autofailover-3 pg_autoctl[25802]: 18:30:00 25805 INFO Stopping pg_autoctl postgres service
Apr 30 18:30:00 autofailover-3 pg_autoctl[25802]: 18:30:00 25805 INFO /bin/pg_ctl --pgdata /var/lib/pgsql/ha --wait stop --mode fast
Apr 30 18:30:01 autofailover-3 pg_autoctl[25802]: 18:30:01 25802 INFO Stop pg_autoctl
Apr 30 18:30:01 autofailover-3 systemd[1]: Stopped pg_auto_failover.
Apr 30 18:30:01 autofailover-3 polkitd[641]: Unregistered Authentication Agent for unix-process:28320:859510 (system bus name :1.66, object path /org/freedesktop/PolicyKit1/AuthenticationAgent, locale en_US.UTF-8) (disconnected from bus)
Apr 30 18:31:21 autofailover-3 sudo[28331]: postgres : TTY=pts/1 ; PWD=/var/lib/pgsql ; USER=root ; COMMAND=/bin/journalctl
Monitor log
Apr 30 18:30:20 autofailover-m pg_autoctl[16466]: 18:30:20 16470 INFO Node 14 (autofailover-2:5432) is marked as healthy by the monitor
Apr 30 18:30:20 autofailover-m pg_autoctl[16466]: 18:30:20 16470 INFO Node 13 (autofailover-3:5432) is marked as unhealthy by the monitor
Apr 30 18:30:20 autofailover-m pg_autoctl[16466]: 18:30:20 16470 INFO Setting goal state of node 13 (autofailover-3:5432) to draining after it became unhealthy.
Apr 30 18:30:20 autofailover-m pg_autoctl[16466]: 18:30:20 16470 INFO New state for node 13 "node_3" (autofailover-3:5432): primary ➜ draining
Apr 30 18:30:20 autofailover-m pg_autoctl[16466]: 18:30:20 16470 INFO Setting goal state of node 14 (autofailover-2:5432) to report_lsn to find the failover candidate
Apr 30 18:30:20 autofailover-m pg_autoctl[16466]: 18:30:20 16470 INFO New state for node 14 "node_2" (autofailover-2:5432): secondary ➜ report_lsn
Apr 30 18:30:20 autofailover-m pg_autoctl[16466]: 18:30:20 16470 INFO Failover still in progress after 0 nodes reported their LSN and we are waiting for 1 nodes to report, activeNode is 14 (autofailover-2:5432) and reported state "secondary"
Apr 30 18:30:21 autofailover-m pg_autoctl[16466]: 18:30:21 16470 INFO Failover still in progress after 0 nodes reported their LSN and we are waiting for 1 nodes to report, activeNode is 14 (autofailover-2:5432) and reported state "secondary"
Apr 30 18:30:21 autofailover-m pg_autoctl[16466]: 18:30:21 16470 INFO Node 14 (autofailover-2:5432) reported new state "report_lsn" with LSN 0/37002BF0
Apr 30 18:30:21 autofailover-m pg_autoctl[16466]: 18:30:21 16470 INFO New state for node 14 "node_2" (autofailover-2:5432): report_lsn ➜ report_lsn
Apr 30 18:30:21 autofailover-m pg_autoctl[16466]: 18:30:21 16470 INFO The current most advanced reported LSN is 0/37002BF0, as reported by node 14 (autofailover-2:5432) and 0 other nodes
Apr 30 18:30:21 autofailover-m pg_autoctl[16466]: 18:30:21 16470 INFO Setting goal state of node 14 (autofailover-2:5432) to prepare_promotion after node 11 (autofailover-1:5432) became unhealthy and 1 nodes reported their LSN position.
Apr 30 18:30:21 autofailover-m pg_autoctl[16466]: 18:30:21 16470 INFO New state for node 14 "node_2" (autofailover-2:5432): report_lsn ➜ prepare_promotion
Apr 30 18:30:22 autofailover-m pg_autoctl[16466]: 18:30:22 16470 INFO Node 14 (autofailover-2:5432) reported new state "prepare_promotion"
Apr 30 18:30:22 autofailover-m pg_autoctl[16466]: 18:30:22 16470 INFO New state for node 14 "node_2" (autofailover-2:5432): prepare_promotion ➜ prepare_promotion
Apr 30 18:30:22 autofailover-m pg_autoctl[16466]: 18:30:22 16470 INFO Setting goal state of node 11 (autofailover-1:5432) to demote_timeout and node 14 (autofailover-2:5432) to stop_replication after node 14 (autofailover-2:5432) converged to prepare_promotion.
Apr 30 18:30:22 autofailover-m pg_autoctl[16466]: 18:30:22 16470 INFO New state for node 14 "node_2" (autofailover-2:5432): prepare_promotion ➜ stop_replication
Apr 30 18:30:22 autofailover-m pg_autoctl[16466]: 18:30:22 16470 INFO New state for node 11 "node_1" (autofailover-1:5432): primary ➜ demote_timeout
Apr 30 18:30:23 autofailover-m pg_autoctl[16466]: 18:30:23 16470 INFO Node 14 (autofailover-2:5432) reported new state "stop_replication"
Apr 30 18:30:23 autofailover-m pg_autoctl[16466]: 18:30:23 16470 INFO New state for node 14 "node_2" (autofailover-2:5432): stop_replication ➜ stop_replication
Apr 30 18:31:02 autofailover-m pg_autoctl[16466]: 18:31:02 16470 INFO Setting goal state of node 14 (autofailover-2:5432) to wait_primary and node 11 (autofailover-1:5432) to demoted after the demote timeout expired.
Apr 30 18:31:02 autofailover-m pg_autoctl[16466]: 18:31:02 16470 INFO New state for node 14 "node_2" (autofailover-2:5432): stop_replication ➜ wait_primary
Apr 30 18:31:02 autofailover-m pg_autoctl[16466]: 18:31:02 16470 INFO New state for node 11 "node_1" (autofailover-1:5432): primary ➜ demoted
Apr 30 18:31:02 autofailover-m pg_autoctl[16466]: 18:31:02 16470 INFO Node 14 (autofailover-2:5432) reported new state "wait_primary"
Apr 30 18:31:02 autofailover-m pg_autoctl[16466]: 18:31:02 16470 INFO New state for node 14 "node_2" (autofailover-2:5432): wait_primary ➜ wait_primary
The monitor is trying to demote Node #11 instead of Node #13. Node #11 should not be involved in the failover.
Note: This issue occurs in
pg_auto_failover 1.5.1 and below.