Resolving checksum mismatch errors in VMware vFabric Postgres (vPostgres) 9.0 / 9.1.x
search cancel

Resolving checksum mismatch errors in VMware vFabric Postgres (vPostgres) 9.0 / 9.1.x

book

Article ID: 341470

calendar_today

Updated On:

Products

VMware

Issue/Introduction


Symptoms:
In the postgresql.log file, you see checksum entries similar to:

2012-03-28 00:01:40.172 UTC cmapp cm PANIC: checksum mismatch: disk has 0xe7bb5225, should be 0x8ab1838b filename base/16386/16994, BlockNum 918, block specifier 1663/16386/16994/0/918
2012-03-28 00:01:40.172 UTC cmapp cm STATEMENT: select child_object_type_id, measure_id, parent_object_id from sv_metric_parent_object_data where run_id = 2786


Note
: The preceding log excerpts are only examples. Date, time, and environmental variables may vary depending on your environment.



Environment

VMware vFabric Postgres Standard Edition 9.1.x
VMware vFabric Postgres Standard Edition 9

Cause

This issue occurs when:
  • The write, which typically happens in pagesize, is only partially written and not reported as an error.
  • The underlying storage has failures that lead to corrupt bits, causing the mismatch of checksum.

Resolution

To resolve this issue, fix the checksum mismatch:
  1. Log in as postgres user or su postgres.

  2. Run this command to shut down vPostgres:

    pg_ctl stop -m smart -D <databasepath>

    where databasepath is the actual path of the database and is the default location of the postgresql.conf file.

  3. Update the bad block by running this command, substituting values specific to your environment:

    postgres --single -D <databasepath> -c fix_block_checksum="1663/16386/16994/0/918"

    Notes:
    • The specification for fix_block_checksum is:

      tablespace/database/relation/fork/blockNum

      Values for this can be found in the log on a read-checksum error.

    • The database is path is normally where the postgresql.conf and postmaster.opts files reside. For example.

      For vCenter Operations Manager – /data/pgsql/data
      For vCSA – /storage/db/vpostgres

  4. Run this command to start the database:

    pg_ctl start -D databasepath

    Note: The database may fail to start and you see an error similar to:

    ERROR: invalid page header in block 57300 of relation pg_tblspc/16385/PG_9.0_201106101/16386/16873

    For more information, see vFabric Postgres fails with the error: invalid page header in block (2039918).


Additional Information

For the vCenter Server Appliance, you may not be able to su to the postgres user without enabling it.
To enable the su to the postgres user, edit the /etc/passwd file and change the postgres user path from /bin/false to /bin/bash.

To edit the /etc/passwd file:

  1. Stop the vmware-vpostgres service by running the command:

    service vmware-vpostgres stop

  2. Edit the /etc/passwd file and change the postgres user path from /bin/false to /bin/bash.

    su postgres
    /opt/vmware/vpostgres/9.0/bin/postgres --single -D /storage/db/vpostgres -c fix_block_checksum.

    Note: Insert the block specifier needs within the double quotes

  3. Start the services by running the command:

    service vmware-vpostgres start

    service vmware-vpxd start
vFabric Postgres fails with the error: invalid page header in block

Impact/Risks:
This procedure runs postgres in single-user mode and fixes the specified block by recalculating the checksum.

Warning: The corrupt data in the block may be lost.