Greenplum: How to Prevent and Identify GPPERFMON Catalog Corruption
search cancel

Greenplum: How to Prevent and Identify GPPERFMON Catalog Corruption

book

Article ID: 409550

calendar_today

Updated On:

Products

VMware Tanzu Greenplum

Issue/Introduction

This article explains how to identify symptoms of GPPERFMON catalog corruption, diagnose corruption using catalog health tools, and prevent recurrence through regular maintenance. The recovery steps should be tailored to the specific type and extent of corruption in your Greenplum cluster.

Identifying GPPERFMON Catalog Corruption

Symptoms of Corruption

  • Query failures on gpperfmon tables, catalog inconsistency errors, or missing relations when connecting to the gpperfmon database.
  • Unusual errors during metric polling or catalog lookup when monitoring system activity.

Sample Error:

# pg_dump -t gpmetrics.gpcc_table_info gpperfmon > /tmp/out

pg_dump: schema with OID XXXXXXXXX does not exist

  • Catalog/bloat symptoms, such as sudden growth or slow response from metric tables.

Detection Methods

  • Routine queries: Run queries against gpperfmon tables and verify data integrity.
  • Log inspection: Check master and segment system logs for catalog-related error messages referencing gpperfmon.
  • Catalog health checks:
    • Use gpcheckcat gpperfmon periodically to verify catalog consistency on all cluster nodes.
    • Review pg_classpg_namespace, and pg_attribute for orphaned or missing relations.

Environment

All supported Greenplum versions.

Resolution

Preventive Best Practices

  • Scheduled VACUUM/ANALYZE: Automate catalog vacuum and analyze jobs for all databases, including gpperfmon. 
  • Catalog consistency checks: Run gpcheckcat before major upgrades or changes.
  • Backup strategy: Maintain periodic backups of the metrics and catalog databases. 

Recovery Strategy

  • If corruption is detected, consult Broadcom's VMware Tanzu Greenplum Support before performing manual repairs. Restore from backup or recreate gpperfmon if it is non-essential, to avoid risk to the main cluster.

Additional Information