PostgreSQL invalid checkpoint preventing DX OI from starting
search cancel

PostgreSQL invalid checkpoint preventing DX OI from starting

book

Article ID: 408873

calendar_today

Updated On:

Products

DX Operational Intelligence

Issue/Introduction

DX platform environment is not coming up due to issue with PostgreSQL POD start.

The PostgreSQL POD logs show the following messages:

LOG:  starting PostgreSQL 16.2 on x86_64-alpine-linux-gnu, ......, 64-bit
LOG:  listening on IPv4 address "0.0.0.0", port 5432
LOG:  listening on IPv6 address "::", port 5432
LOG:  listening on Unix socket "/run/postgresql/.s.PGSQL.5432"
LOG:  database system was interrupted; last known up at 2025-xx-xx xx:xx:xx UTC
LOG:  invalid checkpoint record
PANIC:  could not locate a valid checkpoint record
LOG:  startup process (PID 43) was terminated by signal 6: Aborted
LOG:  aborting startup due to startup process failure
LOG:  database system is shut down

Resolution

Please follow the following steps as a workaround if there is no good back which can be restored.

 

How to reset the wal (write-ahead log) file in dxi-postgres:

1) scale down dxi-postgres pod:  kubectl scale deploy -n dxi dxi-postgres --replicas=0


2) edit dxi-postgres deployment, modifying the spec.template.spc.containers to be in the form (for not starting postgres, but rather just infinite tail).  Save changes:

spec:

  template:

    spec:

      containers:

      - command: ["tail"]

        args: ["-f", "/dev/null"]

        env:

          ...


Also, edit the livenessProbe initialDelay to be very large (e.g. '60000').  Originally, should be a value of '10'
 

3) scale up dxi-postgres pod:  kubectl scale deploy -n dxi dxi-postgres --replicas=1

 

4) exec into dxi-postgres pod: kubectl exec -it dxi-postgres-<suffix> -- /bin/bash

 

5) reset the wal file: pg_resetwal -f --pgdata /opt/data/pgdata ;

 

6) exit from the pod.

 

7) edit the dxi-postgres deployment, removing the references to command, args added before.  first entry should be '- env' now.  Save deployment changes.

 

8) scale up dxi-postgres pod (reference earlier bullet item on command line call).

 

9) tail the dxi-postgres pod log to see that it successfully started up:  kubectl -n dxi logs dxi-postgres-<suffix> -f

 

10) As long as the rest of AIOps is not running, then the database is quiescent and a backup can be taken within the pod.
      a) exec into the dxi-pod
      b) cd /opt/data; mkdir -p ./archive;
      c) tar -czvf ./archive/pgdata_backup_2025XXXX.tar.gz ./pgdata;

exit out of the dxi-postgres pod.