HCX Manager postgresdb service fails to start after upgrade
search cancel

HCX Manager postgresdb service fails to start after upgrade

book

Article ID: 380971

calendar_today

Updated On:

Products

VMware HCX VMware Cloud on AWS

Issue/Introduction

  • HCX Manager upgrade is successful but services fails to start
  • Issue is only seen when you upgrade HCX Manager after it was scaled up following HCX scalability guide

     

  • Validate that the upgrade is successful

    • Login to the HCX Manager as user admin via putty/ssh
      /common/logs/upgrade/upgrade.log
2024-10-29T07:12:01 Upgrade successful, restarting the HCX Manager VM .......................................................... [   OK ]

/common/logs/upgrade/upgrade-status.properties

upgrade.status=COMPLETE
upgrade.message=Upgrade successful, restarting the HCX Manager VM
  • Checking the service status shows postgresdb is in activating state and not running.

    systemctl --type=service | grep "zoo\|kaf\|web\|app\|postgres"

    app-engine.service                   loaded failed     failed        App-Engine
    appliance-management.service         loaded failed     failed        Appliance Management
    kafka.service                        loaded active     running       Kafka
    postgresdb.service                   loaded activating start   start PostgresDB
    zookeeper.service                    loaded active     running       Zookeeper
  • Checking postgresdb service status shows activating.

systemctl status postgresdb

postgresdb.service - PostgresDB
     Loaded: loaded (/etc/systemd/system/postgresdb.service; enabled; vendor preset: enabled)
     Active: activating (start) since Tue 2024-10-29 07:15:45 UTC; 6min ago
Cntrl PID: 5732 (postgresdb-star)
      Tasks: 3 (limit: 2385)
     Memory: 764.0K
     CGroup: /system.slice/postgresdb.service
             ├─ 5732 /bin/bash /etc/systemd/postgresdb-start
             ├─ 5733 sh -x /opt/vmware/init/postgres_init.sh
             └─20713 sleep 1


Oct 29 07:15:35 hcx.####.### postgresdb-start[20709]: ++ ps -fu postgres
Oct 29 07:15:35 hcx.####.### postgresdb-start[5733]: + '[' -z 'UID        PID  PPID  C STIME TTY          TIME CMD
Oct 29 07:15:35 hcx.####.### postgresdb-start[5733]: postgres  5732     1  0 06:09 ?        00:00:00 /bin/bash /etc/systemd/postgresdb-start
Oct 29 07:15:35 hcx.####.### postgresdb-start[5733]: postgres  5733  5732  0 06:09 ?        00:00:00 sh -x /opt/vmware/init/postgres_init.sh
Oct 29 07:15:35 hcx.####.### postgresdb-start[5733]: postgres 20709  5733  0 06:16 ?        00:00:00 ps -fu postgres' ']'
Oct 29 07:15:35 hcx.####.### postgresdb-start[5733]: + '[' 404 -ge 900 ']'
Oct 29 07:15:35 hcx.####.### postgresdb-start[20711]: ++ /usr/pgsql/13/bin/pg_isready -h localhost
Oct 29 07:15:35 hcx.####.### postgresdb-start[20712]: ++ grep accepting
Oct 29 07:15:35 hcx.####.### postgresdb-start[5733]: + '[' -z '' ']'
Oct 29 07:15:35 hcx.####.### postgresdb-start[5733]: + sleep 1
  • Navigate to the log file /var/log/messages to find "Permission denied" error
2024-10-29T07:15:35.326+00:00 hcx.####.### postgresdb-start[1029]: Tue Oct 29 07:15:09 UTC 2024 Starting the PostgresDB...!
2024-10-29T07:15:35.363+00:00 hcx.####.### postgresdb-start[1038]: initdb: error: could not access directory "/common/postgres-db": Permission denied
2024-10-29T07:15:35.364+00:00 hcx.####.### postgresdb-start[1026]: + [[ -d /common/postgres-db ]]
2024-10-29T07:15:35.364+00:00 hcx.####.### postgresdb-start[1026]: + [[ -f /common/postgres-db/postgresql.conf ]]
2024-10-29T07:15:35.364+00:00 hcx.####.### postgresdb-start[1026]: + /usr/pgsql/13/bin/pg_ctl start -D /common/postgres-db
2024-10-29T07:15:35.366+00:00 hcx.####.### postgresdb-start[1046]: pg_ctl: could not open PID file "/common/postgres-db/postmaster.pid": Permission denied
  • Validate the permissions and ownership of /common/postgres-db and /common_ext/postgres-db and ensure there is a symlink as shown below.

    root@hcx [ /common ]# ls -ltrh
    lrwxrwxrwx   1 postgres  postgres    23 Oct 29 05:37 postgres-db -> /common_ext/postgres-db
  • The group ownership of the directory postgres-db is changed as shown below.

    root@hcx [ /common_ext ]# ls -ltrh
    drwx------ 19     1001 appmgmt  4.0K Oct 29 07:06 postgres-db

Environment

VMware HCX

Cause

  • Post HCX Manager upgrade, the ownership and group of the folder /common_ext/postgres-db is changed from postgres:postgres to 1001:appmgmt.
  • This causes postresdb service to go down as it cannot access that directory.

Resolution

Additional Information