Postgres backup fails due to pg_toast error
search cancel

Postgres backup fails due to pg_toast error

book

Article ID: 385475

calendar_today

Updated On:

Products

VMware SDDC Manager

Issue/Introduction

Backup operation fails for vPostgres in SDDC.

Pg_dump fails with:

pg_dump: dumping contents of table "public.vault_secret"
pg_dump: error: Dumping the contents of table "vault_secret" failed: PQgetResult() failed.
pg_dump: detail: Error message from server: ERROR:  unexpected chunk size -4 (expected 380) in chunk 253 of 254 for toast value 478011 in pg_toast_17700
pg_dump: detail: Command was: COPY public.vault_secret (id, secret_text, secondary_key) TO stdout;

Environment

SDDC Manager 5.1.x

Resolution

The steps are written for SDDC Manager but will work for finding a corrupt record in vPostgres where there is a non-numeric primary key or identifier for a table.  In this example we are using the vault_secret table which contains an alphanumeric id and secondary_key. 

Please ensure there is a good backup or snapshot of the appliance prior to making any changes to the database.

To find and remove the record:

        1. Log into the appliance as root.
        2. Log into the database

                     psql -h localhost -U postgres -d operationsmanager

        3. Find the corrupt record or records.

                a. Get a count of the records, in this situation we're looking at the vault_secret table.

                        select count(*) from vault_secret;

                b. Taking half the number from the count we will start from the middle.  In this example the count will be 2000 so we will check the records starting from 1000:

                        select * from vault_secret offset 1000;

                c. If there is a corrupt from records beyond 1000 it'll throw an error "unexpected chunk size".
                d. If it doesn't throw an error the record is before 1000 and we'll half this number to find this record.

                        select * from vault_secret offset 500;

                e. Run through steps a - d until you find the corrupt record.
                f. Verify the record is corrupt by selecting the record, in this case we use the secondary_key to select and in this example we will use 355 as the bad record.

                        select secondary_key from vault_secret offset 355;
            select * from vault_secret where secondary_key=<key from step above>;

                        * This will throw the error "unexpected chunk size".

                g. Remove the record with a delete and from statement.

                        delete the record "select * from vault_secret where secondary_key=<secondary_key>;

                h. Verify there are no other corrupt records and reindex the table.

                        select * from vault_secret;
            reindex table vault_secret;