vPostgres service fails to start on VCSA with PANIC: replication checkpoint has wrong magic XXXX instead of YYYY
search cancel

vPostgres service fails to start on VCSA with PANIC: replication checkpoint has wrong magic XXXX instead of YYYY

book

Article ID: 338172

calendar_today

Updated On:

Products

VMware vCenter Server

Issue/Introduction

Symptoms:

  • vPostgres service on vCenter Server Appliance (VCSA) fails to start
  • /var/log/vmware/vpostgres/postgresql-xx.log:
xxxx.xx.xx xx:xxxx.xxx UTC xxxxxxxx.xxxx 0   LOG:  database system was interrupted; last known up at xxxx-xx-xx xx:xx:xx UTC
xxxx.xx.xx xx:xxxx.xxx UTC xxxxxxxx.xxxx 0   PANIC:  too many replication slots active before shutdown
xxxx.xx.xx xx:xxxx.xxx UTC xxxxxxxx.xxxx 0   HINT:  Increase max_replication_slots and try again.
xxxx.xx.xx xx:xxxx.xxx UTC xxxxxxxx.xxxx 0   LOG:  startup process (PID 40986) was terminated by signal 6: Aborted
xxxx.xx.xx xx:xxxx.xxx UTC xxxxxxxx.xxxx 0   LOG:  aborting startup due to startup process failure
xxxx.xx.xx xx:xxxx.xxx UTC xxxxxxxx.xxxx 0   LOG:  database system is shut down
xxxx.xx.xx xx:xxxx.xxx UTC xxxxxxxx.xxxx 0   LOG:  database system was interrupted; last known up at xxxx-xx-xx xx:xx:xx UTC
xxxx.xx.xx xx:xxxx.xxx UTC xxxxxxxx.xxxx 0   PANIC:  replication checkpoint has wrong magic 4177909209 instead of 307747550
xxxx.xx.xx xx:xxxx.xxx UTC xxxxxxxx.xxxx 0   LOG:  startup process (PID 32173) was terminated by signal 6: Aborted
xxxx.xx.xx xx:xxxx.xxx UTC xxxxxxxx.xxxx 0   LOG:  aborting startup due to startup process failure
xxxx.xx.xx xx:xxxx.xxx UTC xxxxxxxx.xxxx 0   LOG:  database system is shut down


Note: The preceding log excerpts are only examples. Date, time, and environmental variables may vary depending on your environment.

Environment

VMware vCenter Server Appliance

Cause

The issue occurred due to ungraceful shutdown of vCenter which might cause corruption on vPostgres filesystem and does not delete old WAL (write-ahead logging) files from pg_xlog and leads to checkpoint mismatch in pg_logical.

Resolution

Note: Take a backup of the vCenter Server before making any changes

To workaround the issue, rename the replorigin_checkpoint file

  • Log in to the VCSA via ssh
  • Rename the replorigin_checkpoint file
    • cd /storage/db/vpostgres/pg_logical/
    • mv replorigin_checkpoint replorigin_checkpoint.old
  • Start the vpostgres service
    • service-control --start vmware-vpostgres