HCX Manager postgresdb service fails to start after upgrade
search cancel

HCX Manager postgresdb service fails to start after upgrade

book

Article ID: 380971

calendar_today

Updated On:

Products

VMware HCX VMware Cloud on AWS

Issue/Introduction

  • HCX Manager upgrade is successful but services fails to start
  • Issue is only seen when you upgrade HCX Manager after it was scaled up following HCX scalability guide
  • Validate that the upgrade was successful.
    /common/logs/upgrade/upgrade.log

    2024-10-29T07:12:01 Upgrade successful, restarting the HCX Manager VM .......................................................... [   OK ]

    /common/logs/upgrade/upgrade-status.properties

    upgrade.status=COMPLETE
    upgrade.message=Upgrade successful, restarting the HCX Manager VM
  • Checking the service status showed postgresdb was in activating state and not running.

    systemctl --type=service | grep "zoo\|kaf\|web\|app\|postgres"

    app-engine.service                   loaded failed     failed        App-Engine
    appliance-management.service         loaded failed     failed        Appliance Management
    kafka.service                        loaded active     running       Kafka
    postgresdb.service                   loaded activating start   start PostgresDB
    zookeeper.service                    loaded active     running       Zookeeper

     

  • Checking postgresdb service status shows activating.

    systemctl status postgresdb

    postgresdb.service - PostgresDB
         Loaded: loaded (/etc/systemd/system/postgresdb.service; enabled; vendor preset: enabled)
         Active: activating (start) since Tue 2024-10-29 07:15:45 UTC; 6min ago
    Cntrl PID: 5732 (postgresdb-star)
          Tasks: 3 (limit: 2385)
         Memory: 764.0K
         CGroup: /system.slice/postgresdb.service
                 ├─ 5732 /bin/bash /etc/systemd/postgresdb-start
                 ├─ 5733 sh -x /opt/vmware/init/postgres_init.sh
                 └─20713 sleep 1
    
    
    Oct 29 07:15:35 hcx.####.### postgresdb-start[20709]: ++ ps -fu postgres
    Oct 29 07:15:35 hcx.####.### postgresdb-start[5733]: + '[' -z 'UID        PID  PPID  C STIME TTY          TIME CMD
    Oct 29 07:15:35 hcx.####.### postgresdb-start[5733]: postgres  5732     1  0 06:09 ?        00:00:00 /bin/bash /etc/systemd/postgresdb-start
    Oct 29 07:15:35 hcx.####.### postgresdb-start[5733]: postgres  5733  5732  0 06:09 ?        00:00:00 sh -x /opt/vmware/init/postgres_init.sh
    Oct 29 07:15:35 hcx.####.### postgresdb-start[5733]: postgres 20709  5733  0 06:16 ?        00:00:00 ps -fu postgres' ']'
    Oct 29 07:15:35 hcx.####.### postgresdb-start[5733]: + '[' 404 -ge 900 ']'
    Oct 29 07:15:35 hcx.####.### postgresdb-start[20711]: ++ /usr/pgsql/13/bin/pg_isready -h localhost
    Oct 29 07:15:35 hcx.####.### postgresdb-start[20712]: ++ grep accepting
    Oct 29 07:15:35 hcx.####.### postgresdb-start[5733]: + '[' -z '' ']'
    Oct 29 07:15:35 hcx.####.### postgresdb-start[5733]: + sleep 1


    /var/log/messages

    2024-10-29T07:15:35.326+00:00 hcx.####.### postgresdb-start[1029]: Tue Oct 29 07:15:09 UTC 2024 Starting the PostgresDB...!
    2024-10-29T07:15:35.363+00:00 hcx.####.### postgresdb-start[1038]: initdb: error: could not access directory "/common/postgres-db": Permission denied
    2024-10-29T07:15:35.364+00:00 hcx.####.### postgresdb-start[1026]: + [[ -d /common/postgres-db ]]
    2024-10-29T07:15:35.364+00:00 hcx.####.### postgresdb-start[1026]: + [[ -f /common/postgres-db/postgresql.conf ]]
    2024-10-29T07:15:35.364+00:00 hcx.####.### postgresdb-start[1026]: + /usr/pgsql/13/bin/pg_ctl start -D /common/postgres-db
    2024-10-29T07:15:35.366+00:00 hcx.####.### postgresdb-start[1046]: pg_ctl: could not open PID file "/common/postgres-db/postmaster.pid": Permission denied

     

  • Validate the permissions and ownership of /common/postgres-db and /common_ext/postgres-db and there is a symlink.

    root@hcx [ /common ]# ls -ltrh
    lrwxrwxrwx   1 postgres  postgres    23 Oct 29 05:37 postgres-db -> /common_ext/postgres-db

     

  • Notice the below group ownership for postgres-db.

    root@hcx [ /common_ext ]# ls -ltrh
    drwx------ 19     1001 appmgmt  4.0K Oct 29 07:06 postgres-db

Environment

HCX

Cause

Post HCX Manager upgrade the ownership and group of the folder /common_ext/postgres-db has changed from postgres:postgres to 1001:appmgmt
This causes postresdb service to go down as it cannot access that directory.

Resolution

If you believe you have encountered this issue, please open a support case with Broadcom Support and refer to this KB article.
For more information, see Creating and managing Broadcom support cases

Additional Information