How to find and fix append-optimized, column-oriented tables with inconsistent metadata due to ALTER COLUMN [TYPE|SET ENCODING] operation in Greenplum 7.1 to 7.5.4.

Products

VMware Tanzu Greenplum VMware Tanzu Greenplum / Gemfire

Issue/Introduction

Append-optimized, column-oriented tables store columns individually. Each column file contains certain metadata which assist index scan etc. If the metadata between different columns become inconsistent, that could lead to error or wrong result, depending on whether the table has index or not:

If the table has index(es), index scan could stitch columns data from different rows to form a row, leading to wrong query results.
If the table does not have an index, ANALYZE on the table could encounter an error “Unexpected result was returned when getting AO block info”.

For table without index, the analyze error message as below:

gpadmin=# analyze t2;
ERROR:  Unexpected result was returned when getting AO block info for table 't2', column 0, targrow 104. (aocsam.c:1404)  (seg3 slice1 xxx.xxx.xxx.xxx:6752 pid=3750) (aocsam.c:1404)

Environment

gpdb 7.1.0 - 7.5.4

Cause

Greenplum 7.1 optimized ALTER COLUMN [TYPE | SET ENCODING] operation for append-optimized, column-oriented tables by avoiding full table rewrite, instead rewriting just the column being altered. However, there is a bug where the altered column could have inconsistent block headers with other columns in the table. This issue has been fixed in 7.5.5, so any column-oriented table altered in newer versions would not be impacted. But any column-oriented tables that were altered between 7.1.0 and 7.5.4 would need to be detected and fixed.

Resolution

Rewriting the impacted table will re-align the metadata in all columns. Use “ALTER TABLE … SET WITH (REORGANIZE)” command to do that. Note that, if the table is large, rewriting can take some time.

Use the attached script to find all append-optimized, column-oriented tables that have done ALTER COLUMN [TYPE|SET ENCODING] operation since table creation or the last table rewrite time. This initial filtering helps determine the next action to take. The tables are categorized into two groups:

Category A (higher priority): Tables with index(es) that are potentially impacted by the issue.
Category B (lower priority): Tables without an index that are potentially impacted by the issue.

We strongly recommend you fix all tables. If the number of tables is large, prioritize fixing Category A tables, and move on to fix Category B later.

The detailed fix steps follows below:

Step 1: use the attached script to find all potentially impacted tables and to generate repair commands. Run the script will generate four files under the same working directory:

“category_a.csv” : tables in Category A.
“category_b.csv” : tables in Category B.
“reorg_cmds_category_a.sql”: SQL script that fixes tables in Category A.
“reorg_cmds_category_b.sql”: SQL script that fixes tables in Category B.

Step 2: if the lists of Category A or B tables are not large, simply run the corresponding repair script “reorg_cmds_category_a.sql” and “reorg_cmds_category_b.sql”. The script will connect to each database automatically and do the REORGANIZE.
If the list is too large, move on to Step 3,

Note: if many leaf partition tables that belong to the same partition hierarchy need to be fixed, consider only reorganize the root partition if the partition hierarchy is all column-oriented tables. Refer to the root partition name displayed in the csv files.

psql postgres -f reorg_cmds_category_a.sql
psql postgres -f reorg_cmds_category_b.sql

Step 3:
- - If Category A is too large and is not fixed in Step 2, use the following gpcheckcat command to find the exact tables that are impacted. The result could be a subset of the tables in “category_a.csv”. This test utilizes the block directory which only exists with tables in Category A. Use the gpcheckcat auto-generated repair script to fix the tables. The repair script is located under the “gpcheckcat.repair.*” directory in the working directory.
  - If Category B is too large and is not fixed in Step 2, split up the “reorg_cmds_category_b.sql” to perform the commands in multiple phases over time. In the meantime, you may still experience intermittent ANALYZE errors, and you will not be able to create index on these tables until they are reorganized.

gpcheckcat -A -R 'co_blkdir_rownums'

The detection and fix process is depicted in this flow chart:

The Script:

Directly add execution permission and run this script will generate check result and fix script at the current path:

#!/bin/bash

set -u
if [[ -z "${PGPORT:-}" ]]; then
    echo "Please set PGPORT"
    exit 1
fi

# output files
FIND_OUT1="category_a.csv"
FIND_OUT2="category_b.csv"
GEN_OUT1="reorg_cmds_category_a.sql"
GEN_OUT2="reorg_cmds_category_b.sql"

PSQL="psql -X -v ON_ERROR_STOP=1 -At"

# get available databases
DBS="$($PSQL -d postgres -c "SELECT datname FROM pg_database
                              WHERE datallowconn
                              ORDER BY 1;")"

[[ -n "$DBS" ]] || { echo "No databases to process."; exit 0; }

# start new output files
echo "dbname|tablename|rootpartition" > "$FIND_OUT1"
echo "dbname|tablename|rootpartition" > "$FIND_OUT2"
echo "" > "$GEN_OUT1"
echo "" > "$GEN_OUT2"

# the core query to find all potentially impacted tables
SUBQUERY="
SELECT ns.nspname, cls.relname,
    MIN(apo.blkdirrelid) > 0 AS hasblkdir,
    pg_partition_root(cls.oid) AS rootpartition
FROM pg_attribute_encoding enc
  JOIN pg_attribute a
    ON a.attrelid = enc.attrelid AND a.attnum = enc.attnum
  JOIN pg_class cls
    ON enc.attrelid = cls.oid
JOIN pg_namespace ns
  ON cls.relnamespace = ns.oid
JOIN pg_appendonly apo
  ON enc.attrelid = apo.relid
JOIN pg_am am
  ON cls.relam = am.oid
WHERE am.amname = 'ao_column'
  AND cls.relkind = 'r'
  AND a.attisdropped = 'f'
  AND a.atthasmissing = 'f'
  AND enc.lastrownums is null
GROUP BY ns.nspname,cls.oid,cls.relname
HAVING COUNT(DISTINCT enc.xmin::text) > 1
"

for db in $DBS; do
echo "[*] checking database $db..."
# find tables with block directory (category A)
psql -X -d "$db" -v ON_ERROR_STOP=1 -c \
  "COPY (
    SELECT current_database() as dbname,
      quote_ident(q.nspname) || '.' || quote_ident(q.relname) AS tablename,
	  rootpartition::text
	FROM ($SUBQUERY)q
	WHERE hasblkdir
    )
    TO STDOUT WITH (FORMAT CSV, DELIMITER '|', ENCODING 'UTF8')" >> $FIND_OUT1

# find tables without block directory (category B)
psql -X -d "$db" -v ON_ERROR_STOP=1 -c \
  "COPY (
    SELECT current_database() as dbname,
      quote_ident(q.nspname) || '.' || quote_ident(q.relname) AS tablename,
	  rootpartition::text
	FROM ($SUBQUERY)q
	WHERE hasblkdir = 'f'
    )
	TO STDOUT WITH (FORMAT CSV, DELIMITER '|', ENCODING 'UTF8')" >> $FIND_OUT2

# generate REORGANIZE script for tables in category A
echo "\c $db" >> $GEN_OUT1
psql -X -d "$db" -v ON_ERROR_STOP=1 -At -c \
    "WITH q AS ($SUBQUERY)
     SELECT 'ALTER TABLE ' || quote_ident(q.nspname) || '.' || quote_ident(q.relname)|| ' SET WITH (REORGANIZE);'
     FROM q
       LEFT JOIN pg_class c ON q.rootpartition = c.oid
       LEFT JOIN pg_namespace n ON c.relnamespace = n.oid
     WHERE hasblkdir;" >> $GEN_OUT1

# generate REORGANIZE script for tables in category B
echo "\c $db" >> $GEN_OUT2
psql -X -d "$db" -v ON_ERROR_STOP=1 -At -c \
    "WITH q AS ($SUBQUERY)
     SELECT 'ALTER TABLE ' || quote_ident(q.nspname) || '.' || quote_ident(q.relname)|| ' SET WITH (REORGANIZE);'
     FROM q
       LEFT JOIN pg_class c ON q.rootpartition = c.oid
       LEFT JOIN pg_namespace n ON c.relnamespace = n.oid
     WHERE hasblkdir = 'f';" >> $GEN_OUT2
done

echo "---------------------------------------------"
echo "Done. "
echo "Tables with index appended to: $FIND_OUT1"
echo "Tables without index appended to: $FIND_OUT2"
echo "Fix commands for tables with index appended to: $GEN_OUT1"
echo "Fix commands for tables without index appended to: $GEN_OUT2"
echo "---------------------------------------------"