GPDR restore failed due to the following errors.
The first error can be found in the file ~/gpAdminLogs/gpdr_<date>.log on the coordinator/master host:
Error occurred while running command "restore" on the cluster: Could not restore backup
The second error can be found in the file /usr/local/gpdr/logs/gpdb-segXX-restore.log on the segment host(s):
ERROR: [038]: unable to restore while PostgreSQL is running
HINT: presence of 'postmaster.pid' in '/data/master/gpsegXX' indicates PostgreSQL is running.
HINT: remove 'postmaster.pid' only if PostgreSQL is not running.
gpstate reports that the DR cluster is not running:
[CRITICAL]:-gpstate failed. (Reason='could not connect to server: Connection refused
Is the server running on host "localhost" (::1) and accepting
TCP/IP connections on port 5432?
could not connect to server: Connection refused
Is the server running on host "localhost" (127.0.0.1) and accepting
TCP/IP connections on port 5432?
') exiting...
If there are 2 "gpdr restore" commands running simultaneously, then the second command to start will probably fail and cancel the first one when it is only partially done.
It is important to avoid running 2 "gpdr restores" concurrently.
source /usr/local/greenplum-db/greenplum_path.sh && pgbackrest --log-level-console warn --stanza gpdb-seg2 --config /usr/local/gpdr/configs/pgbackrest-seg2.conf repo-ls backup/gpdb-seg2/<DATE>/pg_data/global
This example is for seg2.pg_control.gz
. Run this command on any one of the segments after making the correct segment number modifications in the command above. Make sure all Postgres processes are stopped on coordinator and all segment hosts.
Make sure the sockets in /tmp directories on each host deleted.
Manually remove all postmaster.pid files in the coordinator and segment data directories.
Retry the gpdr restore.
R&D are developing measures to avoid concurrent gpdr restores.