pcp_recovery on vIDM nodes fail with error message "execution of command failed at "lst stage"."

search cancel

pcp_recovery on vIDM nodes fail with error message "execution of command failed at "lst stage"."

book

Article ID: 417854

calendar_today

Updated On:

Products

VCF Operations/Automation (formerly VMware Aria Suite)

Issue/Introduction

The vIDM cluster is reported to be critical in vASL UI > Lifecycle Operations > Environment : Global environment repotted to be critical
One or more of the nodes is reported to be down when viewing pgpool cluster using: su root -c "echo -e 'password'|/opt/vmware/vpostgres/current/bin/psql -h localhost -p 9999 -U pgpool postgres -c \"show pool_nodes\""
The remediate operation fails to perform pcp recovery on the node
Attempting to run pcp_recovery would fail with error:
/usr/local/bin/pcp_recovery_node -h delegateIP -p 9898 -U pgpool -n 0
Password:
ERROR: executing recovery, execution of command failed at "lst stage"
DETAIL: command: "recovery_lst_stage"
The /db/data/serverlog has the below logs indicating the pgpool_recovery, pgpool_regclass tables have been removed
- On the primary server:
  tail -f /db/data/serverlog
  - ERROR: requested WAL segment xxxxxxxxxxxxxxxxxxxxxxxx has already been removed
    ERROR: requested WAL segment xxxxxxxxxxxxxxxxxxxxxxxx has already been removed
    ERROR: requested WAL segment xxxxxxxxxxxxxxxxxxxxxxxx has already been removed
    ERROR: requested WAL segment xxxxxxxxxxxxxxxxxxxxxxxx has already been removed
    ERROR: requested WAL segment xxxxxxxxxxxxxxxxxxxxxxxx has already been removed
    ERROR: requested WAL segment xxxxxxxxxxxxxxxxxxxxxxxx has already been removed
    requested WAL segment xxxxxxxxxxxxxxxxxxxxxxxx has already been removed
    ERROR: function pgpool_recovery(unknown, unknown, unknown, unknown, integer) does not exist at character 8
    No function matches the given name and argument types. You might need to add explicit type casts.
    STATEMENT: SELECT pgpool_recovery('recovery_1st_stage', '<delegate_ip>', '/db/data', '5432', 0)
    ERROR: function pgpool_recovery(unknown, unknown, unknown, unknown, integer) does not exist at character 8
    HINT: No function matches the given name and argument types. You might need to add explicit type casts. STATEMENT: SELECT pgpool_recovery ('recovery_1st_stage', '<delegate_ip>', '/db/data', '5432', 0)
    ------
  - The same can be validated the same with with : tail -f /db/data/serverlog
    
    HINT: No function matches the given name and argument types. You might need to add explicit type casts.
    STATEMENT: SELECT pgpool_recovery ('recovery_1st_stage', '<delegate_ip>', '/db/data', '5432', 0)
    ERROR: relation "pgpool recovery" does not exist at character 15
    STATEMENT: select * from pgpool recovery;
    ERROR: function pgpool_recovery (unknown, unknown, unknown, unknown, integer) does not exist at character 8
    HINT: No function matches the given name and argument types. You might need to add explicit type casts.
    HINT: No function matches the given name and argument types. You might need to add explicit type casts.
- On the standby node:
  tail -f /db/data/serverlog
  - LOG: started streaming WAL from primary at 146/48000000 on timeline 6
    FATAL: could not receive data from WAL stream: ERROR: requested WAL segment xxxxxxxxxxxxxxxxxxxxxxxx has already been removed
    LOG: started streaming WAL from primary at 146/48000000 on timeline 6
    FATAL: could not receive data from WAL stream: ERROR: requested WAL segment xxxxxxxxxxxxxxxxxxxxxxxx has already been removed
    LOG: received fast shutdown request
    LOG: aborting any active transactions
    LOG: shutting down
    LOG: database system is shut down

Environment

VMware Identity Manager 3.3.7

Cause

The cluster has lost it's capability to stabilize and run a pcp recovery, when auto recovery is run from Aria Suite Lifecycle due to the pgpool_recovery, pgpool_regclass tables having been removed .

Resolution

Resolution:

Ideally, it is recommended to revert to vIDM cluster to a healthy snapshot and initiate a Remediate or a Power On for the global environment from vASL.

Workaround:

If there is no snapshot of the cluster in a prior healthy state, manually create the db extensions.
------------------------------------------
Steps to UNDO The prepare-vidm-patch.sh script
------------------------------------------
- Execute the below command on all nodes:
  - /etc/init.d/pgService start
- Execute the below command only on primary:
  - /opt/vmware/vpostgres/current/bin/psql -h localhost -U postgres -d template1 -c "CREATE EXTENSION IF NOT EXISTS pgpool_recovery WITH SCHEMA pg_catalog;"
  - /opt/vmware/vpostgres/current/bin/psql -h localhost -U postgres -d template1 -c "CREATE EXTENSION IF NOT EXISTS pgpool_regclass WITH SCHEMA pg_catalog;"
  - /etc/init.d/NetworkService start
Manually run pcp_recovery for the standby nodes:
- Stop vpostgres service on all the standby nodes:
  - /etc/init.d/vpostgres stop
- Run below command on the primary node:
  - /usr/local/bin/pcp_recovery_node -h delegateIP -p 9898 -U pgpool -n node_id
    Command parameter help
    -h : The affected host on which the command would be run, Use as is. (delegateIP : This is keyword. Need not be changed with IP. Use as is.)
    -p : Port on which PCP process accepts connections, which is 9898
    -U : The Pgpool user, which is pgpool
    -n : Node id which needs to be recovered. <node_id> will be the node that is being corrected. This can obtained from 'node_id' column from the show pool_nodes command.
    pgpool : This is pgpool user. Need not be changed. Use as is.
    The above command would prompt for a password. Enter Password as "password" if the /usr/local/etc/pgpool.pwd password fails to connect.
    
    Expected response
    pcp_recovery_node -- Command Successful
Trigger Inventory Sync from vASL to vIDM and validate the request completes successfully (The Health status would sync up on the next run, unless we manually trigger a 'Trigger Cluster Health' Request)
Log into the vIDM portal and validate cluster health.

Feedback

thumb_up Yes

thumb_down No