The purpose of this article is to document a known issue and the workaround to fix it
# kubectl get pods -n prelude -l name=postgres -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
postgres-0 1/1 Running 0 33m 10.xx.xx.xx vRaFQDN.com <none> <none>
postgres-1 0/1 CrashLoopBackOff 11 33m 10.xx.x.xx vRa1FQDN.com <none> <none>
postgres-2 0/1 CrashLoopBackOff 11 33m 10.xx.x.xx vRa2FQDN.com <none> <none>
Snippets in the /service-logs/prelude/file-logs/postgres.log
database system was interrupted while in recovery
2024-07-16 02:22:43.969 UTC [136] HINT: If this has occurred more than once some data might be corrupted and you might need to choose an earlier recovery target.
2024-07-16 02:22:44.045 UTC [136] LOG: entering standby mode
2024-07-16 02:22:44.049 UTC [136] FATAL: recovery aborted because of insufficient parameter settings
2024-07-16 02:22:44.049 UTC [136] DETAIL: max_connections = 4100 is a lower setting than on the primary server, where its value was 4450.
2024-07-16 02:22:44.049 UTC [136] HINT: You can restart the server after making the necessary configuration changes.
2024-07-16 02:22:44.050 UTC [134] LOG: startup process (PID 136) exited with exit code 1
2024-07-16 02:22:44.050 UTC [134] LOG: aborting startup due to startup process failure
2024-07-16 02:22:44.066 UTC [134] LOG: database system is shut down
VMware Aria Automation 8.x
This might happen due to a race within the pods or split brain due to network isolation.
Its good to isolate if Split Brain is due to network isolation by following this article https://knowledge.broadcom.com/external/article/317721.
If the outcome of the above article results in no luck, Please proceed with the below steps:
kubectl delete pod -n prelude <Postgres_Pod_Name1>; kubectl delete pod -n prelude <Postgres_Pod_Name1>;
kubectl delete pod -n prelude postgres-2; kubectl delete pod -n prelude postgres-2;