vCenter PostgreSQL Archiver Service Wont Start - Dead Postgres Replication Slot
search cancel

vCenter PostgreSQL Archiver Service Wont Start - Dead Postgres Replication Slot

book

Article ID: 388731

calendar_today

Updated On: 03-04-2025

Products

VMware vCenter Server

Issue/Introduction

During the operation of the vCenter PostgreSQL database, the pg_archiver service fails with the error message in ERROR pg_archiver could not seek compressed file "/storage/archive/vpostgres/000000010000004A00000061.gz": Invalid argument

The issue occurs in the /var/log/vmware/vpostgres/pg_archiver.log.stderr file and may affect database replication.

Also in tail -f /var/log/vmware/vpostgres/postgresql.log you will see similar error

[...]

2025-02-20 10:07:06.725 UTC 65b778ca.25a22 0 archiver archiver [local] 154146 3DETAIL:  User "archiver" has no password assigned.
        Connection matched pg_hba.conf line 72: "local        all             all                             scram-sha-256"
2025-02-20 10:07:06.725 UTC 65b778ca.25a22 0 archiver archiver [local] 154146 4LOG:  could not send data to client: Broken pipe
2025-02-20 10:07:06.747 UTC 65b778ca.25a28 0 [unknown] [unknown] [local] 154152 1LOG:  connection received: host=[local]
2025-02-20 10:07:06.749 UTC 65b778ca.25a28 0 [unknown] archiver [local] 154152 2LOG:  connection authenticated: identity="vpostgres" method=peer (/storage/db/vpostgres/pg_hba.conf:9)
2025-02-20 10:07:06.749 UTC 65b778ca.25a28 0 [unknown] archiver [local] 154152 3LOG:  replication connection authorized: user=archiver application_name=pg_archiver
2025-02-20 10:07:06.752 UTC 65b778ca.25a28 0 [unknown] archiver [local] 154152 4ERROR:  replication slot "vpg_archiver" already exists

[...]

 

 

Environment

 

  • vCenter Server
  • PostgreSQL Database (vCenter's internal database)

 

Cause

The error is caused by the pg_archiver process being unable to access or seek the archived segment files due to a problem with the compression or file location. Specifically, it fails when trying to find the compressed file (000000010000004A00000061.gz) in the /storage/archive/vpostgres/ directory.

This could be due to several potential causes:

  • Corrupted or inaccessible archived WAL (Write-Ahead Logging) files.
  • Insufficient permissions or a missing file.
  • Issues with the disk or filesystem configuration.

Resolution

Searched for replication_slot Removing the PG_Replication_Slot

query the pg_replication_slots.

Run command : 

/opt/vmware/vpostgres/current/bin/psql -U postgres -d VCDB -c "select * from pg_replication_slots;"

  slot_name   | plugin | slot_type | datoid | database | temporary | active | active_pid | xmin | catalog_xmin | restart_lsn | confirmed_flush_lsn | wal_status | safe_wal_size | two_phase
--------------+--------+-----------+--------+----------+-----------+--------+------------+------+--------------+-------------+---------------------+------------+---------------+-----------
 vpg_archiver |        | physical  |        |          | f         | f      |            |      |              | 3/AD000000  |                     | reserved   |               | f
(1 row)

In PostgreSQL, the “active” column represents whether a replication slot is currently actively being used or not. An “f” in this column indicates that the replication slot is not active, meaning it is not currently being utilized. Conversely, a value of “t” would indicate that the replication slot is active and is currently being used.

Removing the PG_Replication_Slot

The solution here would be to remove the vpg_archiver replication slot using pg_drop_replication_slot.

From the vCenter Server shell execute:

/opt/vmware/vpostgres/current/bin/psql -U postgres -d VCDB -c "select pg_drop_replication_slot('vpg_archiver');"

 pg_drop_replication_slot
--------------------------

(1 row)

Query the pg_replication_slots once more to validate its removal.

/opt/vmware/vpostgres/current/bin/psql -U postgres -d VCDB -c "select * from pg_replication_slots;"

 slot_name | plugin | slot_type | datoid | database | temporary | active | active_pid | xmin | catalog_xmin | restart_lsn | confirmed_flush_lsn | wal_status | safe_wal_size | two_phase
-----------+--------+-----------+--------+----------+-----------+--------+------------+------+--------------+-------------+---------------------+------------+---------------+-----------
(0 rows)

As the final step before attempting to restart the vmware-postgres-archiver service, delete any existing segment files within the /storage/archive/vpostgres/ directory

root@vcsa [ ~ ] df -h /storage/archive/vpostgres/

Filesystem                      Size  Used Avail Use% Mounted on
/dev/mapper/archive_vg-archive   49G  4.8G   42G  11% /storage/archive

Remove it: rm /storage/archive/vpostgres/*

 

Start the vmware-postgres-archiver service:

service-control --start vmware-postgres-archiver

Operation not cancellable. Please wait for it to finish...
Performing start operation on service vmware-postgres-archiver...
Successfully started service vmware-postgres-archiver

Additional Information

The above changes should be made with the assistance of Broadcom support over a call, ensuring that proper backups and snapshots are taken. If there are linked vCenters, offline snapshots should be captured. For guidance, refer to the KB for offline snapshots in vCenter.