Let's assume this scenario - the member
[postgresql2] failed to start andÂ
pg_ctl start does not work either:
-bash-4.2$ patronictl -c patroni_etcd_conf.d/postgres_member1.yaml list
+-----------------+-------------+-------------------+--------+--------------+----+-----------+
| Cluster | Member | Host | Role | State | TL | Lag in MB |
+-----------------+-------------+-------------------+--------+--------------+----+-----------+
| patroni_cluster | postgresql1 | 172.28.8.251:7432 | Leader | running | 31 | |
| patroni_cluster | postgresql2 | 172.28.8.251:7433 | | start failed | | unknown |
| patroni_cluster | postgresql3 | 172.28.8.251:7434 | | running | 31 | 0 |
-bash-4.2$ pg_ctl start -D ./data2/
waiting for server to start....2020-10-06 21:58:33.353 CST [20916] LOG: listening on IPv4 address "172.28.8.251", port 7433
2020-10-06 21:58:33.355 CST [20916] LOG: listening on Unix socket "./.s.PGSQL.7433"
2020-10-06 21:58:33.365 CST [20916] LOG: redirecting log output to logging collector process
2020-10-06 21:58:33.365 CST [20916] HINT: Future log output will appear in directory "my_pg_log".
stopped waiting
pg_ctl: could not start server
Examine the log output.
From the Patroni log we can see Patroni tried
pg_rewind but it did not work due to timeline divergence.
[postgesql2] is still in timeline
28, while others are already in timeline
31.
2020-10-06 21:41:24,850 INFO: running pg_rewind from postgresql1
2020-10-06 21:41:24,869 INFO: running pg_rewind from dbname=postgres user=rewind_user host=172.28.8.251 port=7432
servers diverged at WAL location 0/701A3E8 on timeline 28
no rewind required
2020-10-06 21:41:24,887 WARNING: Postgresql is not running.
From
pg_log, we can see
[postgresql2] lacks the log files to make itself enter timeline
31:
2020-10-06 21:42:13.716 CST [17711] LOG: database system was shut down in recovery at 2020-10-06 21:41:24 CST
2020-10-06 21:42:13.717 CST [17711] LOG: entering standby mode
2020-10-06 21:42:13.717 CST [17711] FATAL: requested timeline 31 does not contain minimum recovery point 0/701A420 on timeline 28
2020-10-06 21:42:13.718 CST [17709] LOG: startup process (PID 17711) exited with exit code 1
2020-10-06 21:42:13.718 CST [17709] LOG: aborting startup due to startup process failure
2020-10-06 21:42:13.719 CST [17709] LOG: database system is shut down