Sometimes, when upgrade Tanzu Postgres for Kubernetes 3.0 to 4.2.4, certain postgres instance may failed, some may succeed. For the failed instance, from container log, it will report:
INFO: unable to find 0000012345 in the archive asynchronously
INFO: archive-get command end: completed successfully (104ms)
FATAL: requested timeline 20 is not a child of this server's history
DETAIL: Latest checkpoint is at AB/12345676 on timeline 19, but in the history of the requested timeline, the server forked off from that timeline at AB/12345879.
It's found from the /pgsql/data/pg_wal directory, the expected .history file did not exist. This caused the above error. the history file should always exist when a cluster has gone through one or more promotions. The immediate impact is that any new replica attempting to join will fail.
R&D team provided a script (sonic-issue-v3.sh) to valid each instance before start the upgrade.
./precheck.sh cluster_name instance_name
The script will generate a output like below :
Cluster with issue:
Test[1] Current TL history file : FAIL
00000008.history missing from pg_wal/ — standbys will fail to initialize via streaming
Test[2] Stale/Next TL check : SAFE
no higher timeline history file found in local pg_wal/ or archive
Cluster with NO issue :
Test[1] Current TL history file : PASS
00000008.history exists in pg_wal/ — standbys can initialize via streaming
Test[2] Stale/Next TL check : SAFE
no higher timeline history file found in local pg_wal/ or archive
For cluster with issue ( history file missing), the workaround is
Clusters failing Test 1
Disable HA before upgrading and wait until the cluster has reduced to a single node:
highAvailability:
enabled: false
readReplicas: 0
Do not proceed with the upgrade until the cluster confirms single-node state.
Clusters that had Test 2 FAIL
These clusters carry stale history files and are at higher risk during the upgrade. If a cluster fails to come up after the upgrade, do not attempt manual intervention — restore immediately from the backup taken before the upgrade.