We have a Portal instance using the internal Postgresql database in the pre-production environment. The slave database seems to be in an inconsistent state and is unsuccessfully trying to recover.
The logs are showing following lines:
portal_portaldb-slave.1.4gcu4gc1bhup@cswg032 | LOG: redo starts at 3/8481F168portal_portaldb-slave.1.4gcu4gc1bhup@cswg032 | LOG: consistent recovery state reached at 3/84840D18portal_portaldb-slave.1.4gcu4gc1bhup@cswg032 | LOG: record with incorrect prev-link 0/1 at 3/84840D18portal_portaldb-slave.1.4gcu4gc1bhup@cswg032 | LOG: database system is ready to accept read only connectionsportal_portaldb-slave.1.4gcu4gc1bhup@cswg032 | LOG: started streaming WAL from primary at 3/84000000 on timeline 1portal_portaldb-slave.1.4gcu4gc1bhup@cswg032 | FATAL: could not receive data from WAL stream: ERROR: requested WAL segment 000000010000000300000084 has already been removed
What is the proper procedure to rebuild or recover the slave database and restart replication?
Release : 4.x
Component : API PORTAL
You can recover the portal_portaldb-slave container by executing the following steps
Make sure you have a valid database backup.
Remove the docker stack :
docker stack rm portal
List the docker persistent volumes :
docker volume ls
Remove the slave persistent volume "portal_database-postgres-slave-volume" :
docker volume rm portal_database-postgres-slave-volume
Restart the portal by running :
portal.sh
Verify if all containers are starting :
docker service ls