pg_auto_failover stuck in draining state in multi-node setup in VMware Postgres
search cancel

pg_auto_failover stuck in draining state in multi-node setup in VMware Postgres

book

Article ID: 296398

calendar_today

Updated On:

Products

VMware Tanzu Greenplum

Issue/Introduction

As of pg_auto_failover 1.4.0, pg_auto_failover supports having multiple secondary (replica) nodes. However, when there are multiple secondary nodes that experience failover, the second secondary node gets stuck in a draining state. 

Let's assume that we have the following initial state:
   Name |  Node |           Host:Port |        LSN | Reachable |       Current State |      Assigned State
--------+-------+---------------------+------------+-----------+---------------------+--------------------
node_10 |    10 | autofailover-3:5433 | 0/3D000000 |       yes |           secondary |           secondary
 node_1 |    11 | autofailover-1:5432 | 0/3F000110 |       yes |             primary |             primary
 node_2 |    14 | autofailover-2:5432 | 0/3F000110 |       yes |           secondary |           secondary
 node_3 |    15 | autofailover-3:5432 | 0/3D000000 |       yes |           secondary |           secondary

If we bring down node_1 (Node #11), pg_auto_failover will correctly failover. In this case, it failed over to node_10 (node #10). Shutting down node_10 will result in node_3 (Node #13) becoming the primary.

However, once we shut down node_3 (Node #13), we see that the state gets stuck in a draining state:
   Name |  Node |           Host:Port |        LSN | Reachable |       Current State |      Assigned State
--------+-------+---------------------+------------+-----------+---------------------+--------------------
node_10 |    10 | autofailover-3:5433 | 0/36008BB8 |        no |             primary |             demoted
 node_1 |    11 | autofailover-1:5432 | 0/37000350 |        no |             primary |             demoted
 node_3 |    13 | autofailover-3:5432 | 0/37002BF0 |        no |             primary |            draining
 node_2 |    14 | autofailover-2:5432 | 0/37002DB8 |       yes |        wait_primary |        wait_primary

If you attempt to bring any of the other nodes back up, it will fail. The logs will show that the monitor is not looking for the correct node when updating the states:


Node_3 (Node #13) Log

Apr 30 18:30:00 autofailover-3 sudo[28319]: postgres : TTY=pts/1 ; PWD=/var/lib/pgsql ; USER=root ; COMMAND=/bin/systemctl stop pgautofailover
Apr 30 18:30:00 autofailover-3 polkitd[641]: Registered Authentication Agent for unix-process:28320:859510 (system bus name :1.66 [/usr/bin/pkttyagent --notify-fd 5 --fallback], object path /org/freedesktop/PolicyKit1/AuthenticationAgent, locale en_US.UTF-8)
Apr 30 18:30:00 autofailover-3 systemd[1]: Stopping pg_auto_failover...
Apr 30 18:30:00 autofailover-3 pg_autoctl[25802]: 18:30:00 25802 INFO  pg_autoctl received signal SIGTERM, terminating
Apr 30 18:30:00 autofailover-3 pg_autoctl[25802]: 18:30:00 25805 INFO  Postgres controller service received signal SIGTERM, terminating
Apr 30 18:30:00 autofailover-3 pg_autoctl[25802]: 18:30:00 25805 INFO  Stopping pg_autoctl postgres service
Apr 30 18:30:00 autofailover-3 pg_autoctl[25802]: 18:30:00 25805 INFO  /bin/pg_ctl --pgdata /var/lib/pgsql/ha --wait stop --mode fast
Apr 30 18:30:01 autofailover-3 pg_autoctl[25802]: 18:30:01 25802 INFO  Stop pg_autoctl
Apr 30 18:30:01 autofailover-3 systemd[1]: Stopped pg_auto_failover.
Apr 30 18:30:01 autofailover-3 polkitd[641]: Unregistered Authentication Agent for unix-process:28320:859510 (system bus name :1.66, object path /org/freedesktop/PolicyKit1/AuthenticationAgent, locale en_US.UTF-8) (disconnected from bus)
Apr 30 18:31:21 autofailover-3 sudo[28331]: postgres : TTY=pts/1 ; PWD=/var/lib/pgsql ; USER=root ; COMMAND=/bin/journalctl


Monitor log

Apr 30 18:30:20 autofailover-m pg_autoctl[16466]: 18:30:20 16470 INFO  Node 14 (autofailover-2:5432) is marked as healthy by the monitor
Apr 30 18:30:20 autofailover-m pg_autoctl[16466]: 18:30:20 16470 INFO  Node 13 (autofailover-3:5432) is marked as unhealthy by the monitor
Apr 30 18:30:20 autofailover-m pg_autoctl[16466]: 18:30:20 16470 INFO  Setting goal state of node 13 (autofailover-3:5432) to draining after it became unhealthy.
Apr 30 18:30:20 autofailover-m pg_autoctl[16466]: 18:30:20 16470 INFO  New state for node 13 "node_3" (autofailover-3:5432): primary ➜ draining
Apr 30 18:30:20 autofailover-m pg_autoctl[16466]: 18:30:20 16470 INFO  Setting goal state of node 14 (autofailover-2:5432) to report_lsn to find the failover candidate
Apr 30 18:30:20 autofailover-m pg_autoctl[16466]: 18:30:20 16470 INFO  New state for node 14 "node_2" (autofailover-2:5432): secondary ➜ report_lsn
Apr 30 18:30:20 autofailover-m pg_autoctl[16466]: 18:30:20 16470 INFO  Failover still in progress after 0 nodes reported their LSN and we are waiting for 1 nodes to report, activeNode is 14 (autofailover-2:5432) and reported state "secondary"
Apr 30 18:30:21 autofailover-m pg_autoctl[16466]: 18:30:21 16470 INFO  Failover still in progress after 0 nodes reported their LSN and we are waiting for 1 nodes to report, activeNode is 14 (autofailover-2:5432) and reported state "secondary"
Apr 30 18:30:21 autofailover-m pg_autoctl[16466]: 18:30:21 16470 INFO  Node 14 (autofailover-2:5432) reported new state "report_lsn" with LSN 0/37002BF0
Apr 30 18:30:21 autofailover-m pg_autoctl[16466]: 18:30:21 16470 INFO  New state for node 14 "node_2" (autofailover-2:5432): report_lsn ➜ report_lsn
Apr 30 18:30:21 autofailover-m pg_autoctl[16466]: 18:30:21 16470 INFO  The current most advanced reported LSN is 0/37002BF0, as reported by node 14 (autofailover-2:5432) and 0 other nodes
Apr 30 18:30:21 autofailover-m pg_autoctl[16466]: 18:30:21 16470 INFO  Setting goal state of node 14 (autofailover-2:5432) to prepare_promotion after node 11 (autofailover-1:5432) became unhealthy and 1 nodes reported their LSN position.
Apr 30 18:30:21 autofailover-m pg_autoctl[16466]: 18:30:21 16470 INFO  New state for node 14 "node_2" (autofailover-2:5432): report_lsn ➜ prepare_promotion
Apr 30 18:30:22 autofailover-m pg_autoctl[16466]: 18:30:22 16470 INFO  Node 14 (autofailover-2:5432) reported new state "prepare_promotion"
Apr 30 18:30:22 autofailover-m pg_autoctl[16466]: 18:30:22 16470 INFO  New state for node 14 "node_2" (autofailover-2:5432): prepare_promotion ➜ prepare_promotion
Apr 30 18:30:22 autofailover-m pg_autoctl[16466]: 18:30:22 16470 INFO  Setting goal state of node 11 (autofailover-1:5432) to demote_timeout and node 14 (autofailover-2:5432) to stop_replication after node 14 (autofailover-2:5432) converged to prepare_promotion.
Apr 30 18:30:22 autofailover-m pg_autoctl[16466]: 18:30:22 16470 INFO  New state for node 14 "node_2" (autofailover-2:5432): prepare_promotion ➜ stop_replication
Apr 30 18:30:22 autofailover-m pg_autoctl[16466]: 18:30:22 16470 INFO  New state for node 11 "node_1" (autofailover-1:5432): primary ➜ demote_timeout
Apr 30 18:30:23 autofailover-m pg_autoctl[16466]: 18:30:23 16470 INFO  Node 14 (autofailover-2:5432) reported new state "stop_replication"
Apr 30 18:30:23 autofailover-m pg_autoctl[16466]: 18:30:23 16470 INFO  New state for node 14 "node_2" (autofailover-2:5432): stop_replication ➜ stop_replication
Apr 30 18:31:02 autofailover-m pg_autoctl[16466]: 18:31:02 16470 INFO  Setting goal state of node 14 (autofailover-2:5432) to wait_primary and node 11 (autofailover-1:5432) to demoted after the demote timeout expired.
Apr 30 18:31:02 autofailover-m pg_autoctl[16466]: 18:31:02 16470 INFO  New state for node 14 "node_2" (autofailover-2:5432): stop_replication ➜ wait_primary
Apr 30 18:31:02 autofailover-m pg_autoctl[16466]: 18:31:02 16470 INFO  New state for node 11 "node_1" (autofailover-1:5432): primary ➜ demoted
Apr 30 18:31:02 autofailover-m pg_autoctl[16466]: 18:31:02 16470 INFO  Node 14 (autofailover-2:5432) reported new state "wait_primary"
Apr 30 18:31:02 autofailover-m pg_autoctl[16466]: 18:31:02 16470 INFO  New state for node 14 "node_2" (autofailover-2:5432): wait_primary ➜ wait_primary

The monitor is trying to demote Node #11 instead of Node #13. Node #11 should not be involved in the failover.

Note: This issue occurs in pg_auto_failover 1.5.1 and below.

Environment

Product Version: 13.2

Resolution

This is a bug in the monitor that is fixed in 1.5.2 so please upgrade if you are experiencing this problem. 

To workaround this issue if you cannot upgrade, we advise having one or two secondary nodes defined.

The complete bug fix can be found here: