We experience issues with DX Platform ACC pods are unable to initialize:
kubectl get pods --namespace dxi -o wide| grep -v Running
ng-acc-configserver-cp-deployment-#######-### 0/1 Init:CrashLoopBackOff 9 (99s ago) 24m
ng-acc-configserver-db-deployment-#######-### 0/1 CrashLoopBackOff 9 (2m56s ago) 24m
ng-acc-configserver-deployment-#######-### 0/1 Init:CrashLoopBackOff 9 (108s ago) 24m
kubectl logs -ndxi ng-acc-configserver-db-deployment-#######-###
Defaulted container "ng-acc-configserver-db-container" out of: ng-acc-configserver-db-container, init-fs (init)
PostgreSQL Database directory appears to contain a database; Skipping initialization
waiting for server to start....2023-11-06 14:19:05.943 UTC [16]: [1] LOG: pgaudit extension initialized
2023-11-06 14:19:05.944 UTC [16]: [2] LOG: starting PostgreSQL 12.12 on x86_64-alpine-linux-gnu, compiled by gcc (Alpine 11.2.1_git20220219) 11.2.1 20220219, 64-bit
2023-11-06 14:19:05.944 UTC [16]: [3] LOG: listening on Unix socket "/run/postgresql/.s.PGSQL.5432"
2023-11-06 14:19:06.064 UTC [17]: [1] LOG: database system was shut down at 2023-11-05 04:50:29 UTC
2023-11-06 14:19:06.069 UTC [17]: [2] LOG: invalid resource manager ID in primary checkpoint record
2023-11-06 14:19:06.069 UTC [17]: [3] PANIC: could not locate a valid checkpoint record
.2023-11-06 14:19:07.582 UTC [16]: [4] LOG: startup process (PID 17) was terminated by signal 6: Aborted
2023-11-06 14:19:07.582 UTC [16]: [5] LOG: aborting startup due to startup process failure
2023-11-06 14:19:07.649 UTC [16]: [6] LOG: database system is shut down
stopped waiting
pg_ctl: could not start server
Examine the log output
DX Platform 23.x
2) You have to modify the following 4 items in the yml file:
3) Save it and deploy it using k8s cli or UI:
kubectl -n <namespace> apply -f <yml file>
4) Fix the Posgres DB:
a) scale down ng-acc-configserver-db deployment
kubectl -n <your-namespace> scale deployment ng-acc-configserver-db-deployment --replicas=0
b) scale up the NEW ng-acc-configserver-db-cli : it will create a new pod that has Postgres tooling and has mounted the postgres directories
kubectl -n <your-namespace> scale deployment ng-acc-configserver-db-cli --replicas=1
- open terminal of ng-acc-configserver-db-cli deployment's pod:
kubectl -n dxi exec -it <your ng-acc-configserver-db-cli pod id> -- /bin/bash
- run pg_resetwal command
pg_resetwal /var/lib/postgresql/data/
the command will report the issues, fixes, otherwise, the expected output is: "Write-head log reset"
5) scale down the deployment ng-acc-configserver-db-cli
kubectl -n <your-namespace> scale deployment ng-acc-configserver-db-cli --replicas=0
6) scale up ng-acc-configserver-db deployment
kubectl -n <your-namespace> scale deployment ng-acc-configserver-db-deployment --replicas=1
7) check the ng-acc-configserver-db pod log and verify DB is up now:
the expected messages is "database system is ready to accept connections"
kubectl -n <your-namespace> logs ng-acc-configserver-db-deployment-####-###
8) scale up the rest of the ACC pods
kubectl -n <namespace> scale deployment ng-acc-configserver-cp-deployment --replicas=1
kubectl -n <namespace> scale deployment ng-acc-configserver-deployments --replicas=1
kubectl -n <namespace> scale deployment ng-acc-repository-deployment --replicas=1
9) Verify ACC is working as expected:
- check that all acc pods are up and running.
- login to APM, check that Agent packages are available
- open Command Center, check that historical information is available.