gpdr restore fails with, "could not resume WAL replay: ERROR: could not access status of transaction 3"
search cancel

gpdr restore fails with, "could not resume WAL replay: ERROR: could not access status of transaction 3"

book

Article ID: 383618

calendar_today

Updated On:

Products

VMware Tanzu Greenplum VMware Tanzu Data Services

Issue/Introduction

The command  gpdr restore --type continuous --restore-point 20241204-091001R fails.

 

[gpadmin@lx00763 Broadcom]$  gpdr restore --type continuous --restore-point 20241204-091001R --debug
20241204:17:27:19 gpdr:gpadmin:lx00763:3967298-[DEBUG]:-Running command: gpdr restore --type continuous --restore-point 20241204-091001R --debug
20241204:17:27:19 gpdr:gpadmin:lx00763:3967298-[INFO]:-Restoring database cluster
20241204:17:27:19 gpdr:gpadmin:lx00763:3967298-[DEBUG]:-Checking for pgbackrest conf files
20241204:17:27:20 gpdr:gpadmin:lx00763:3967298-[DEBUG]:-source /usr/local/greenplum-db-7.3.1/greenplum_path.sh &&  pgbackrest --log-level-console warn --stanza gpdb-seg-1 --config /usr/local/gpdr/configs/pgbackrest-seg-1.conf repo-ls gpdr/restore-points --recurse --filter "(/20241204-091001R)$"
20241204:17:27:20 gpdr:gpadmin:lx00763:3967298-[DEBUG]:-source /usr/local/greenplum-db-7.3.1/greenplum_path.sh &&  pgbackrest --log-level-console warn --stanza gpdb-seg-1 --config /usr/local/gpdr/configs/pgbackrest-seg-1.conf repo-ls gpdr/restore-points --recurse --filter "(20241204-091001R)$"
20241204:17:27:20 gpdr:gpadmin:lx00763:3967298-[DEBUG]:-source /usr/local/greenplum-db-7.3.1/greenplum_path.sh &&  pgbackrest --log-level-console warn --stanza gpdb-seg-1 --config /usr/local/gpdr/configs/pgbackrest-seg-1.conf repo-ls gpdr/restore-points --sort asc --recurse
20241204:17:27:20 gpdr:gpadmin:lx00763:3967298-[DEBUG]:-setting gp_pause_on_restore_point_replay to '20241204-091001R' on all segments
20241204:17:27:20 gpdr:gpadmin:lx00763:3967298-[DEBUG]:-Reload postgresql.conf using pg_ctl
20241204:17:27:21 gpdr:gpadmin:lx00763:3967298-[ERROR]:-error occurred while restoring database cluster: could not resume WAL replay: ERROR: could not access status of transaction 3  (seg37 slice1 10.198.27.102:6001 pid=3920985) (SQLSTATE 58P01)
Please refer to /home/gpadmin/gpAdminLogs/gpdr_20241204.log file for details.

 

Back trace from the logs 

 

2024-12-04 20:08:04.122580 CET,"gpadmin","postgres",p4039978,th-1573766976,"10.10.10.10","44824",2024-12-04 20:08:04 CET,0,con9943,cmd3,seg0,,,,sx1,"LOG","00000","statement: SET application_name TO 'gpdr'"
0,,"postgres.c",1729,
2024-12-04 20:08:04.136739 CET,"gpadmin","postgres",p4039978,th-1573766976,"10.10.10.10","44824",2024-12-04 20:08:04 CET,0,con9943,cmd5,seg0,,,,sx1,"LOG","00000","statement: SET gp_hot_standby_snapshot_mod
nconsistent'",,,,,,,0,,"postgres.c",1729,
2024-12-04 20:08:04.145005 CET,"gpadmin","postgres",p4039978,th-1573766976,"10.10.10.10","44824",2024-12-04 20:08:04 CET,0,con9943,cmd8,seg0,slice1,,,sx1,"LOG","00000","statement: SELECT pg_wal_replay_resu
FROM gp_dist_random('gp_id')
UNION ALL
SELECT pg_wal_replay_resume();",,,,,,"SELECT pg_wal_replay_resume()
FROM gp_dist_random('gp_id')
UNION ALL
SELECT pg_wal_replay_resume();",0,,"postgres.c",1288,
2024-12-04 20:08:04.220129 CET,"gpadmin","postgres",p4039978,th-1573766976,"10.10.10.10","44824",2024-12-04 20:08:04 CET,0,con9943,cmd8,seg0,slice1,,,sx1,"ERROR","58P01","could not access status of transaction 3"
,"Could not open file ""pg_distributedlog/0000"": No such file or directory.",,,,,"SELECT pg_wal_replay_resume()
FROM gp_dist_random('gp_id')
UNION ALL
SELECT pg_wal_replay_resume();",0,,"slru.c",939,"Stack trace:
1    0xd01646 postgres errstart (elog.c:494)
2    0x6c7675 postgres <symbol not found> (slru.c:939)
3    0x806fc2 postgres SimpleLruReadPage (slru.c:460)
4    0x82bf10 postgres DistributedLog_AdvanceOldestXmin (distributedlog.c:251)
5    0xb6c10d postgres GetSnapshotData (procarray.c:2651)
6    0xd5f5c8 postgres GetTransactionSnapshot (snapmgr.c:439)
7    0xb97fed postgres PortalStart (pquery.c:638)
8    0xb9164d postgres <symbol not found> (discriminator 10)
9    0xb959fb postgres PostgresMain (postgres.c:5583)
10   0xafe118 postgres <symbol not found> (postmaster.c:4605)
11   0xaff01e postgres PostmasterMain (discriminator 5)
12   0x76eba3 postgres main (main.c:173)
13   0x7f35a2829590 libc.so.6 <symbol not found> + 0xa2829590
14   0x7f35a2829640 libc.so.6 __libc_start_main + 0x80
15   0x77aa25 postgres _start + 0x25

 

 

Environment

  • Greenplum_6.29.0
  • GPDB 6.29.x  and GPDB 7.3.x

 

 

  • Greenplum_7.4.0

Cause

GPDR psql sessions that use gp_hotstandby_snapshot_mode 'inconsistent' will try to read the distributed log with an invalid xid and then fail to find an existing distributed log to go with that invalid xid. This is a know issue. 

Resolution

The fix is currently in production and will be released early 2025. Look for the incident number 33682 in future GPDB release notes