Pgpool commands are not working in one node of VIDM cluster
search cancel

Pgpool commands are not working in one node of VIDM cluster

book

Article ID: 374202

calendar_today

Updated On:

Products

VCF Operations/Automation (formerly VMware Aria Suite)

Issue/Introduction

  • PgService and vPostgres are running in the vIDM nodes.
  • However, in one of the nodes the pgpool commands from the article Troubleshooting VMware Identity Manager postgres cluster deployed through vRealize Suite Lifecycle Manager request a password and fails with the error ERROR: connection to host "localhost" failed with error "Connection timeout"
  • The command su root -c "echo -e 'password'|/usr/local/bin/pcp_watchdog_info -p 9898 -h localhost -U pgpool" will indicate that the status of the node is LOST or SHUTDOWN or DEAD.
  • poolnodes command shows the standby node status as quarantine
  • No route to host errors are seen withing horizon.log and workspace.log when horizon-workspace attempts to connect to the local PSQL instance but cannot resolve delegateIP

Environment

  • VMware Identity Manager 3.3.x

Cause

This issue can happen due to network outages.

Resolution

  1. Validate if the Postgres Cluster service and vPostgres are running in the unhealthy node.
    /etc/init.d/pgService status
    /etc/init.d/vpostgres status
  2. If required start vPostgres or Postgres Cluster service(s) and then retry the pgpool commands from the article Troubleshooting VMware Identity Manager postgres cluster deployed through vRealize Suite Lifecycle Manager
    /etc/init.d/pgService start
    /etc/init.d/vpostgres start
  3. Try to gracefully stop and start the PgService.
    /etc/init.d/pgService stop
    /etc/init.d/vpostgres stop
    rm -rf /tmp/pgpool_status
    /etc/init.d/vpostgres start
    /etc/init.d/pgService start
  4. If pgService is not stopping on Nodes or DEAD / SHUTDOWN Nodes: Run the below command for a force stop
    pkill -e -9 pgpool
    rm /tmp/.s.PGSQL.*
    rm /tmp/.s.PGPOOL.*
    rm /tmp/pgpool_status
    rm /var/run/pgpool/pgpool.pid
    rm -rf /var/run/pgpool/socket
    fuser -k 9999/tcp
    
    /etc/init.d/pgService status
    /etc/init.d/pgService start
  5. If pgService service was stuck while starting, forcefully restart the pgService on both the DEAD / SHUTDOWN nodes:
    /etc/init.d/pgService stop
    
    fuser -k 9999/tcp >/dev/null 2>&1
    rm /tmp/.s.PGSQL.* >/dev/null 2>&1
    rm /tmp/.s.PGPOOL.* >/dev/null 2>&1
    rm /tmp/pgpool_status >/dev/null 2>&1
    rm /var/run/pgpool/pgpool.pid >/dev/null 2>&1
    rm -rf /var/run/pgpool/socket >/dev/null 2>&1
    
    /etc/init.d/pgService status
    /etc/init.d/pgService start
  6. If the node goes QUARANTINE run the below command on the primary node. <node_id> will be the node that is being corrected. This can be obtained from the 'node_id' column from the poolnodes command output.
    su root -c "cat /usr/local/etc/pgpool.pwd|/usr/local/bin/pcp_recovery_node -v -h delegateIP -p 9898 -U pgpool -n <node_id>"