Accessing VCSA using vSphere Client returns "No healthy upstream" due to PostgreSQL Archiver service crashing
search cancel

Accessing VCSA using vSphere Client returns "No healthy upstream" due to PostgreSQL Archiver service crashing

book

Article ID: 405070

calendar_today

Updated On:

Products

VMware vCenter Server 8.0 VMware vCenter Server

Issue/Introduction

  • Accessing vCenter Server Appliance (VCSA) via vSphere Client returns "No healthy upstream"
  • Performing a restart of services on VCSA fails while starting vmware-postgres-archiver (VMware Postgres Archiver)
  • /var/log/vmware/vmon/vmon.log:

YYYY-MM-DDTHH:MM:SS Wa(03) host-<> <vmware-postgres-archiver> Service pre-start command's stderr: 2025-07-21T13:14:18.191Z DEBUG      pg_archiver creating replication slot "vpg_archiver"
YYYY-MM-DDTHH:MM:SS Wa(03)+ host-<> YYYY-MM-DDTHH:MM:SS ERROR    pg_archiver could not send replication command "CREATE_REPLICATION_SLOT "vpg_archiver" PHYSICAL": ERROR:  replication slot "vpg_archiver" already exists
YYYY-MM-DDTHH:MM:SS Wa(03) host-<> <vc-ws1a-broker> Service pre-start command's stderr: umount:
YYYY-MM-DDTHH:MM:SS Wa(03) host-<> <vc-ws1a-broker> Service pre-start command's stderr: /storage/containers/vc-ws1a-broker/<ID>/rootfs: not mounted.
YYYY-MM-DDTHH:MM:SS Wa(03) host-<> <vtsdb> Service pre-start command's stderr: Issuing signal KILL on all PostgreSQL processes owned by OS user vtsdbuser

  • /var/log/vmware/vpostgres/pg_archiver.log.stderr:

Starting service process with pid: <PID>.
YYYY-MM-DDTHH:MM:SS ERROR  pg_archiver could not open directory "/storage/archive/vpostgres": No such file or directory

  • Checking the archive partition on VCSA returns no output

# df -h | grep -i archive

Cause

This issue is caused due to the missing "/storage/archive" partition on VCSA

Resolution

In order to resolve the issue, proceed with the steps to repair VCSA filesystem.

Note: Before proceeding, take a snapshot of the affected virtual appliance.
  • Log in to the GRUB menu of VCSA

At the end of the line, add fsck.repair=yes then press F10 to continue booting the appliance. This will force the filesystem to check and auto-resolve disk issues. The appliance may silently reboot several times to fix issues as needed.

  1. In the GRUB editor, locate the line that begins with linux.
  2. At the end of the line, add fsck.repair=yes

  3. Press F10 to continue booting the appliance with the modified parameters.

Example:

  • (Optional): Post reboot, the service might continue to fail if Replication slots are not cleared

Search for replication_slot Removing the PG_Replication_Slot

  • Login to the vCenter Server appliance SSH.
  • To query the pg_replication_slots run the below command : 

/opt/vmware/vpostgres/current/bin/psql -U postgres -d VCDB -c "select * from pg_replication_slots;"

Output Example: 

  slot_name   | plugin | slot_type | datoid | database | temporary | active | active_pid | xmin | catalog_xmin | restart_lsn | confirmed_flush_lsn | wal_status | safe_wal_size | two_phase
--------------+--------+-----------+--------+----------+-----------+--------+------------+------+--------------+-------------+---------------------+------------+---------------+-----------
 vpg_archiver |        | physical  |        |          | f         | f      |            |      |              | 3/AD000000  |                     | reserved   |               | f
(1 row)

Note : In PostgreSQL, the “active” column represents whether a replication slot is currently actively being used or not. An “f” in this column indicates that the replication slot is not active, meaning it is not currently being utilized. Conversely, a value of “t” would indicate that the replication slot is active and is currently being used.

Remove the PG_Replication_Slot

  • To remove the vpg_archiver replication slot using pg_drop_replication_slot, execute the below command :

/opt/vmware/vpostgres/current/bin/psql -U postgres -d VCDB -c "select pg_drop_replication_slot('vpg_archiver');"

 pg_drop_replication_slot
--------------------------

(1 row)

Query the pg_replication_slots once more to validate its removal.

To validate the removal of the pg_replication_slots, execute the below command :

/opt/vmware/vpostgres/current/bin/psql -U postgres -d VCDB -c "select * from pg_replication_slots;"

 slot_name | plugin | slot_type | datoid | database | temporary | active | active_pid | xmin | catalog_xmin | restart_lsn | confirmed_flush_lsn | wal_status | safe_wal_size | two_phase
-----------+--------+-----------+--------+----------+-----------+--------+------------+------+--------------+-------------+---------------------+------------+---------------+-----------
(0 rows)

  • Start the pg_archiver service

service-control --start vmware-postgres-archiver