ERROR","XX000","read beyond eof in table of GPDB 6.x
search cancel

ERROR","XX000","read beyond eof in table of GPDB 6.x

book

Article ID: 297060

calendar_today

Updated On:

Products

VMware Tanzu Greenplum

Issue/Introduction

The following errors have been found out in the circumstance there are regular schedules to automatically run vacuum analyze by crontab in order to prevent increasing database age.
psql:d.ctm_t_p_i_ae.sql:13: ERROR: read beyond eof in table "ctm_t_p_i_1_p_p2" file "base/16384/113881387.1", read position 0 (small offset 0), actual read length 0 (large read length 65536) (cdbbufferedread.c:211) (seg53 1.X.X.X:40005 pid=31664) (cdbbufferedread.c:211)
Time: 96793.802 ms

It looks like this issue could be relevant with #12197 in GPDB 6.17.1 release note[1] according to the description that data corruption issue which could occur with concurrent DML queries on AO/AOCO tables when GPDB was not aware of committed transaction that started after snapshot initiation. However currently GPDB 23.1 which is higher than 6.17.1.

In pg_log, below call trace appeared additionally.

2023-10-20 11:30:46.919607 KST,"XXuser","cdb",p31664,th-607065984,"1.X.X.X","24822",2023-10-20 11:04:29 KST,0,con3378784,cmd11,seg53,,,,sx1,"ERROR","XX000","read beyond eof in table ""ctm_t_p_i_1_p_p2"" file ""base/16384/113881387.1"", read position 0 (small offset 0), actual read length 0 (large read length 65536) (cdbbufferedread.c:211)",,,,,,"select pg_catalog.gp_acquire_sample_rows(279530133, 187, 'f');",0,,"cdbbufferedread.c",211,"Stack trace:
1 0xbfe9ac postgres errstart (elog.c:557)
2 0xcb2990 postgres <symbol not found> (cdbbufferedread.c:202)
3 0x7704e3 postgres appendonly_getnext (appendonlyam.c:340)
4 0x8240fe postgres acquire_sample_rows (analyze.c:1769)
5 0x8b2c21 postgres gp_acquire_sample_rows (analyzefuncs.c:218)
6 0x8ccb07 postgres <symbol not found> (execQual.c:2013)
7 0x8cda21 postgres <symbol not found> (execQual.c:2644)
8 0x8d341d postgres ExecProject (execQual.c:6395)
9 0x8f3d5d postgres ExecResult (nodeResult.c:218)
10 0x8c5b58 postgres ExecProcNode (execProcnode.c:970)
11 0x8bcb69 postgres <symbol not found> (tuptable.h:159)
12 0x8bd88e postgres standard_ExecutorRun (execMain.c:3168)
13 0x8bda65 postgres ExecutorRun (execMain.c:992)
14 0xa998f7 postgres <symbol not found> (pquery.c:1159)
15 0xa9b921 postgres PortalRun (pquery.c:1006)
16 0xa95920 postgres <symbol not found> (postgres.c:1855)
17 0xa98ddd postgres PostgresMain (postgres.c:5477)
18 0x6b2a73 postgres <symbol not found> (postmaster.c:4484)
19 0xa1ce06 postgres PostmasterMain (postmaster.c:1520)
20 0x6b69f1 postgres main (main.c:205)
21 0x7f3dd86b63d5 libc.so.6 __libc_start_main + 0xf5
22 0x6c289c postgres <symbol not found> + 0x6c289c

What's the cause and official resolution or who could it be avoided as workaround if there are no greenplum 6.x version to upgrade to fix this issue?

[ GPDB 6.x release note ]
[1] https://docs.vmware.com/en/VMware-Greenplum/6/greenplum-database/relnotes-release-notes.html#release-6.17


Environment

Product Version: 6.23
OS: RHEL or CentOS 7.x

Resolution

As this issue was already reported in gpdb upstream[1] the cause has been known due to the lack of lock on the child partition allows the concurrent VACUUM drop phase to open it with AccessExclusiveLock, and then procceed to truncate the old awaiting drop segment file and update pg_aoseg_rel. Had the child partition been locked by SELECT, the VACUUM drop phase would have been skipped. To sum up GPDB 6.x does not gram locks for child tables which is causing this problem while GPDB 7.x does and it's not going to be fixed in next version of GPDB 6.x because it's complex logics to change original source code in old version of PostgreSQL. For more details about investigation step, please refer the gpdb upstream[1].

Instead you could choose options ether upgrading GPDB 7.x or applying the below workaround to avoid this issue.

[ Workaround ]
Explicitly enclose the concurrent SELECT statement in a BEGIN;END block, for example:

begin;
lock table foo in access share mode;
select count(*) from foo;
end;

[ GPDB Upstream ]
https://github.com/greenplum-db/gpdb/issues/16545