gpexpand fails because of catalog error - gp_relation_node (2) has more entries than pg

Products

VMware Tanzu Greenplum

Issue/Introduction

When running the first phase of gpexpand using command like gpexpand -i input_file -D database_name, to initialize segments and create the expansion schema in 5x, it can fail because of a catalog error, where we have to rollback the gpexpand

20210612:12:44:56:019170 gpexpand:hostname:gpadmin-[ERROR]:-gpexpand failed. exiting...
Traceback (most recent call last):
  File "/usr/local/greenplum-db-5.28.7/bin/gpexpand", line 3159, in main
    _gp_expand.prepare_schema()
  File "/usr/local/greenplum-db-5.28.7/bin/gpexpand", line 1968, in prepare_schema
    self._populate_partitioned_tables(dbname)
  File "/usr/local/greenplum-db-5.28.7/bin/gpexpand", line 2099, in _populate_partitioned_tables
    curs = dbconn.execSQL(table_conn, sql)
  File "/usr/local/greenplum-db-5.28.7/lib/python/gppylib/db/dbconn.py", line 248, in execSQL
    cursor.execute(sql)
  File "/usr/local/greenplum-db-5.28.7/lib/python/pygresql/pgdb.py", line 259, in execute
    self.executemany(operation, (params,))
  File "/usr/local/greenplum-db-5.28.7/lib/python/pygresql/pgdb.py", line 289, in executemany
    raise DatabaseError("error '%s' in '%s'" % (msg, sql))
DatabaseError: error 'ERROR:  gp_relation_node (2) has more entries than pg_aocsseg (0) for relation tablename (aocssegfiles.c:1920)  (seg13 slice1 10.24.164.215:6005 pid=20037) (cdbdisp.c:254)
CONTEXT:  SQL statement "select sum(pg_relation_size('schemaname.tablename'))::int8 from gp_dist_random('gp_id');"
' in '

And this error is not detectable via gpcheckcat, because an offline gpcheckcat reported with no issue.

Note: The catalog error itself can be recovered by internal kb 8791

Environment

Product Version: 5.28

Resolution

Consider the following options before rerun the expansion, While there can be more potential discrepancies between gp_relation_node and pg_aocsseg

- R&D developed a script to detect the damaged AO tables: https://github.com/asimrp/detect_damaged_ao_tables/blob/main/damaged_ao_tables.sql
Example output of this script also shown the segment data directory and aoseg relfilenode:

ao_tablename  | gpsegid | ao_oid | aoseg_oid | aoseg_count | pt_count | aoseg_relfilenode |                                  datadir                                   
---------------+---------+--------+-----------+-------------+----------+-------------------+----------------------------------------------------------------------------
 co_test       |       0 |  16539 |     16544 |           0 |        3 |             16472 | /Users/apraveen/workspace/gpdb5/gpAux/gpdemo/datadirs/dbfast1/demoDataDir0
 co_t1         |       0 |  16394 |     16399 |           0 |        3 |             16393 | /Users/apraveen/workspace/gpdb5/gpAux/gpdemo/datadirs/dbfast1/demoDataDir0
 ao_t1         |       0 |  16384 |     16389 |           0 |        3 |             16387 | /Users/apraveen/workspace/gpdb5/gpAux/gpdemo/datadirs/dbfast1/demoDataDir0
 foo_p_1_prt_8 |       2 |  16698 |     16701 |           0 |        2 |             16542 | /Users/apraveen/workspace/gpdb5/gpAux/gpdemo/datadirs/dbfast3/demoDataDir2
(4 rows)

Note: The script’s output is relevant only if gpcheckcat has reported no issues, and the use of this script should be limited to the cases where select on AO fails with “gp_relation_node has more/less entries ...” error.

- If a immediate re-run is required, consider run the first phase of gpexpand with a "-S" option. This option disables detailed progress reporting. If this option is not specified, detailed progress reporting involves computing size of each existing table using pg_relation_size() and this step is failing for the damaged table. The command can be something like:

gpexpand -S -i input_file -D database_name

The second phase involves redistribution of data should be able to redistribute the non-damaged tables, and left those damaged ones with the same "gp_relation_node has more entries than ..." error. After the tables are fixed manually, rerun of gpexpand will continue and when no damaged tables remain, the data redistribution (second phase) should finish successfully. The command can be:

gpexpand
or
gpexpand -d duration

You are not likely to encounter the same issue in gp6 as PT has been removed since that.