User sees Aria Automation portal is down.
kubectl get nodes -n prelude' - All nodes show in a ready state./opt/health/run.sh' on each node - It comes back with no errors on any node.kubectl get pods -n prelude -o wide' and you see:postgres-# 0/1 Running #
vracli status' and see:{
"Node name": "postgres-#",
"vra_status": "error",
"vra_error": "error: unable to upgrade connection: container not found (\"control\")"
}
kubectl get pods -n prelude -o wide --selector=app=postgrescd /services-logs/prelude/
cd postgres-# (the number will be specific to the appliance)
less postgres.log#-#-# #:#:#.# UTC [#] HINT: If this has occurred more than once some data might be corrupted and you might need to choose an earlier recovery target.
#-#-# #:#:#.# UTC [#] LOG: entering standby mode
#-#-# #:#:#.# UTC [#] FATAL: requested timeline # does not contain minimum recovery point #/### on timeline #
#-#-# #:#:#.# UTC [#] LOG: startup process (PID #) exited with exit code #
#-#-# #:#:#.# UTC [#] LOG: aborting startup due to startup process failure
#-#-# #:#:#.# UTC [#] LOG: database system is shut down
Aria Automation 8.18.x
Issue where the secondary Database WAL checkpoint timeline is ahead of the primary database upon crash.
The timeline on the secondary database node will be higher than the minimum recovery point timeline found on the primary database node.
cd /data/db/
rm -r /live/*rm -r /flags/*postgres" pods:kubectl delete pods -n prelude --selector=app=postgreskubectl get pods -n prelude -o wide --selector=app=postgres'postgres-# 1/1 Running') , SSH to the node running the 'postgres-0' pod and run:kubectl exec -it -n prelude postgres-0 -- bashpostgres":su - postgresrepmgr -f /etc/repmgr.conf cluster show ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
-----+-----------------------------------------------+---------+-----------+-----------------------------------------------+----------+----------+----------+-------------------------------------------------------------------------------------------------------------------------------------------------
100 | postgres-0.postgres.prelude.svc.cluster.local | primary | * running | | default | 100 | 6 | host=postgres-0.postgres.prelude.svc.cluster.local dbname=repmgr-db user=repmgr-db passfile=/run/repmgr-db.cred connect_timeout=10 keepalives=1
101 | postgres-1.postgres.prelude.svc.cluster.local | standby | running | postgres-0.postgres.prelude.svc.cluster.local | default | 99 | 6 | host=postgres-1.postgres.prelude.svc.cluster.local dbname=repmgr-db user=repmgr-db passfile=/run/repmgr-db.cred connect_timeout=10 keepalives=1
102 | postgres-2.postgres.prelude.svc.cluster.local | standby | running | postgres-0.postgres.prelude.svc.cluster.local | default | 98 | 6 | host=postgres-2.postgres.prelude.svc.cluster.local dbname=repmgr-db user=repmgr-db passfile=/run/repmgr-db.cred connect_timeout=10 keepalives=1exit' and hit Enter.exit' and hit Enter again, you should be back on the Appliance shell./opt/scripts/deploy.sh' from the Primary node.