gpbackup causes a panic when backing up a read-replica cluster in a GPDR configuration
search cancel

gpbackup causes a panic when backing up a read-replica cluster in a GPDR configuration

book

Article ID: 418576

calendar_today

Updated On:

Products

VMware Tanzu Data Suite VMware Tanzu Greenplum VMware Tanzu Greenplum / Gemfire

Issue/Introduction

When using gpbackup to backup from a DR Greenplum cluster in read-replca can cause a panic on the coordinator.

gpbackup output:

20251113:10:13:32 gpbackup:gpadmin:cdw:2258924-[INFO]:-Backup Command: [gpbackup --dbname testdb --jobs 8 --compression-type zstd --compression-level 3 --leaf-partition-data]
20251113:10:13:32 gpbackup:gpadmin:cdw:2258924-[INFO]:-gpbackup version = 1.32.0
20251113:10:13:34 gpbackup:gpadmin:cdw:2258924-[INFO]:-Greenplum Database Version = 7.5.4 build commit:b3699b1d9609881095a06f4ac01443ed94670365
20251113:10:13:34 gpbackup:gpadmin:cdw:2258924-[INFO]:-Starting backup of database testdb
20251113:10:13:37 gpbackup:gpadmin:cdw:2258924-[INFO]:-Backup Timestamp = 20251113101332
20251113:10:13:37 gpbackup:gpadmin:cdw:2258924-[INFO]:-Backup Database = testdb
20251113:10:13:37 gpbackup:gpadmin:cdw:2258924-[INFO]:-Gathering table state information
20251113:10:13:38 gpbackup:gpadmin:cdw:2258924-[INFO]:-Acquiring ACCESS SHARE locks on tables
20251113:10:13:39 gpbackup:gpadmin:cdw:2258924-[INFO]:-Gathering additional table metadata
20251113:10:13:55 gpbackup:gpadmin:cdw:2258924-[INFO]:-Getting storage information
20251113:10:14:06 gpbackup:gpadmin:cdw:2258924-[INFO]:-Metadata will be written to /data/coordinator/gpseg-1/backups/20251113/20251113101332/gpbackup_20251113101332_metadata.sql
20251113:10:14:06 gpbackup:gpadmin:cdw:2258924-[INFO]:-Writing global database metadata
20251113:10:14:06 gpbackup:gpadmin:cdw:2258924-[INFO]:-Global database metadata backup complete
20251113:10:14:06 gpbackup:gpadmin:cdw:2258924-[INFO]:-Writing pre-data metadata
20251113:10:14:30 gpbackup:gpadmin:cdw:2258924-[CRITICAL]:-unexpected EOF
github.gwd.broadcom.net/TNZ/gp-common-go-libs/gplog.FatalOnError
        /tmp/build/30d2ce6c/gpbackup_src/vendor/github.gwd.broadcom.net/TNZ/gp-common-go-libs/gplog/gplog.go:479
github.gwd.broadcom.net/TNZ/gp-backup/backup.GetConstraints
        /tmp/build/30d2ce6c/gpbackup_src/backup/queries_shared.go:235
github.gwd.broadcom.net/TNZ/gp-backup/backup.retrieveConstraints
        /tmp/build/30d2ce6c/gpbackup_src/backup/wrappers.go:278
github.gwd.broadcom.net/TNZ/gp-backup/backup.backupPredata
        /tmp/build/30d2ce6c/gpbackup_src/backup/backup.go:293
github.gwd.broadcom.net/TNZ/gp-backup/backup.DoBackup
        /tmp/build/30d2ce6c/gpbackup_src/backup/backup.go:204
main.main.func1
        /tmp/build/30d2ce6c/gpbackup_src/gpbackup.go:24
github.com/spf13/cobra.(*Command).execute
        /tmp/build/30d2ce6c/gpbackup_src/vendor/github.com/spf13/cobra/command.go:920
github.com/spf13/cobra.(*Command).ExecuteC
        /tmp/build/30d2ce6c/gpbackup_src/vendor/github.com/spf13/cobra/command.go:1044
github.com/spf13/cobra.(*Command).Execute
        /tmp/build/30d2ce6c/gpbackup_src/vendor/github.com/spf13/cobra/command.go:968
main.main
        /tmp/build/30d2ce6c/gpbackup_src/gpbackup.go:28
runtime.main
        /usr/local/go/src/runtime/proc.go:283
runtime.goexit
        /usr/local/go/src/runtime/asm_amd64.s:1700
20251113:10:14:31 gpbackup:gpadmin:cdw:2258924-[INFO]:-Found neither /usr/local/greenplum-db-7.5.4/bin/gp_email_contacts.yaml nor /home/gpadmin/gp_email_contacts.yaml
20251113:10:14:31 gpbackup:gpadmin:cdw:2258924-[INFO]:-Email containing gpbackup report /data/coordinator/gpseg-1/backups/20251113/20251113101332/gpbackup_20251113101332_report will not be sent
20251113:10:14:31 gpbackup:gpadmin:cdw:2258924-[INFO]:-Beginning cleanup
20251113:10:14:31 gpbackup:gpadmin:cdw:2258924-[INFO]:-Cleanup complete

Back trace example:

2025-11-13 10:14:30.346070 CET,"gpadmin","testdb",p2258934,th-1176898432,"127.0.0.1","62670",2025-11-13 10:13:32 CET,0,con379,cmd10928,seg-1,,,,sx3,"LOG","00000","execute <unnamed>: 
                SELECT con.oid,
                        quote_ident(n.nspname) AS schema,
                        quote_ident(conname) AS name,
                        contype,
                        con.conislocal,
                        pg_get_constraintdef(con.oid, TRUE) AS def,
                        quote_ident(n.nspname) || '.' || quote_ident(c.relname) AS owningobject,
                        'f' AS isdomainconstraint,
                        CASE
                                WHEN pt.partrelid IS NULL THEN 'f'
                                ELSE 't'
                        END AS ispartitionparent
                FROM pg_constraint con
                        LEFT JOIN pg_class c ON con.conrelid = c.oid
                        LEFT JOIN pg_partitioned_table pt ON con.conrelid = pt.partrelid
                        JOIN pg_namespace n ON n.oid = con.connamespace
                WHERE n.nspname NOT LIKE 'pg_temp_%' AND n.nspname NOT LIKE 'pg_toast%' AND n.nspname NOT IN ('gp_toolkit', 'information_schema', 'pg_aoseg', 'pg_bitmapindex', 'pg_catalog') 
                        AND c.oid NOT IN (select objid from pg_depend where deptype = 'e')
                        AND c.relname IS NOT NULL
                        AND contype != 't'
                        AND (c.relispartition IS FALSE OR conislocal IS TRUE)
                        AND coninhcount = 0
                GROUP BY con.oid, conname, contype, c.relname, n.nspname, con.conislocal, pt.partrelid
UNION
        SELECT con.oid,
                quote_ident(n.nspname) AS schema,
                quote_ident(conname) AS name,
                contype,
                con.conislocal,
                pg_get_constraintdef(con.oid, TRUE) AS def,
                quote_ident(n.nspname) || '.' || quote_ident(t.typname) AS owningobject,
                't' AS isdomainconstraint,
                'f' AS ispartitionparent
        FROM pg_constraint con
                LEFT JOIN pg_type t ON con.contypid = t.oid
                JOIN pg_namespace n ON n.oid = con.connamespace
        WHERE n.nspname NOT LIKE 'pg_temp_%' AND n.nspname NOT LIKE 'pg_toast%' AND n.nspname NOT IN ('gp_toolkit', 'information_schema', 'pg_aoseg', 'pg_bitmapindex', 'pg_catalog') 
                AND con.oid NOT IN (select objid from pg_depend where deptype = 'e')
                AND t.typname IS NOT NULL
        GROUP BY con.oid, conname, contype, n.nspname, con.conislocal, t.typname
        ORDER BY name",,,,,,,0,,"postgres.c",2818,
2025-11-13 10:14:30.641865 CET,,,p2258934,th0,,,2025-11-13 10:13:32 CET,0,con379,cmd10928,seg-1,,,,,"PANIC","XX000","Unexpected internal error: Master process received signal SIGSEGV",,,,,,,0,,,,"1    0x7f90ba23ebf0 libc.so.6 <symbol not found> + 0xba23ebf0
2    0xca1dbd postgres <symbol not found> (ruleutils.c:1962)
3    0xca2716 postgres pg_get_constraintdef_ext (ruleutils.c:1905)
4    0x9b0116 postgres <symbol not found> (execExprInterp.c:664)
5    0x9cc91a postgres <symbol not found> (executor.h:403)
6    0x9d1b9c postgres <symbol not found> (nodeAgg.c:2689)
7    0x9bd287 postgres <symbol not found> (execProcnode.c:646)
8    0x9bf3e1 postgres ExecScan (execScan.c:137)
9    0x9bd287 postgres <symbol not found> (execProcnode.c:646)
10   0x9cadde postgres <symbol not found> (nodeAppend.c:297)
11   0x9bd287 postgres <symbol not found> (execProcnode.c:646)
12   0x9cb794 postgres <symbol not found> (executor.h:271)
13   0x9cdec2 postgres <symbol not found> (nodeAgg.c:2717)
14   0x9d1c89 postgres <symbol not found> (nodeAgg.c:2329)
15   0x9bd287 postgres <symbol not found> (execProcnode.c:646)
16   0x9ebe20 postgres <symbol not found> (nodeSort.c:164)
17   0x9bd287 postgres <symbol not found> (execProcnode.c:646)
18   0x9b44ad postgres <symbol not found> (execMain.c:2680)
19   0x9b4db7 postgres standard_ExecutorRun (execMain.c:978)
20   0x9b4fb5 postgres ExecutorRun (execMain.c:793)
21   0xbba61b postgres <symbol not found> (pquery.c:1156)
22   0xbbc479 postgres PortalRun (pquery.c:999)
"

Cause

The problem is that, in 7.5.4 it is directly using `CachedRpSnapshot` and it does not register the snapshot. Therefore, when a transaction does the RegisterSnapshot and UnregisterSnapshot it would try to "pfree" the snapshot because its registration count is 0. And, the next time another transaction comes in, it still uses the aleady-freed `CachedRpSnapshot` (since the pointer does not change), and will panic when it tries to "pfree" the memory again.

Resolution

The issue is resolved in Greenplum DB 7.6.0 and above.

Upgrade to 7.6.0 or above to get the fix,