Postgres Operator upgrade (3.0.0 to 4.2.4) results in pods getting stuck in 3/4 state and pg_container is failing
search cancel

Postgres Operator upgrade (3.0.0 to 4.2.4) results in pods getting stuck in 3/4 state and pg_container is failing

book

Article ID: 429677

calendar_today

Updated On:

Products

VMware Tanzu Data Services Solutions VMware Tanzu for Postgres

Issue/Introduction

After upgrading the Postgres Operator from version 3.0.0 to 4.2.4, Postgres instances fail to become fully healthy.

Postgres instance remains stuck in 3/4 readiness, with the pg-container failing.

The pod logs show the following error:

CRITICAL: system ID mismatch, node <instance-name> belongs to a different cluster:
<system_id_1> != <system_id_2>

Cause

Each PostgreSQL cluster has a unique internal system identifier, stored in the data directory (PGDATA).

In Operator 3.x architecture, Patroni stores cluster metadata and state information in Kubernetes ConfigMaps (including leader information and cluster identity).

During the operator upgrade from 3.0.0 to 4.2.4:

  • The PostgreSQL data directory remains intact.

  • However, stale Patroni cluster metadata stored in Kubernetes ConfigMaps may persist.

  • If the metadata in these ConfigMaps contains a system identifier that does not match the one stored in the PostgreSQL data directory, Patroni detects a mismatch.

When this occurs, Patroni prevents PostgreSQL from starting and logs: CRITICAL: system ID mismatch

The issue is therefore caused by stale or inconsistent Patroni cluster metadata remaining after operator upgrade, leading to a mismatch between:

  • The PostgreSQL data directory system ID

  • The system ID stored in Kubernetes ConfigMaps

Resolution

The issue can be resolved by removing the stale Patroni ConfigMaps and allowing them to be recreated automatically.

Step 1: For the affected cluster, delete the following ConfigMaps:

kubectl delete cm <cluster-name>-config 
kubectl delete cm <cluster-name>-custom-config 
kubectl delete cm <cluster-name>-leader 
kubectl delete cm <cluster-name>-sync
 

 

Step 2: Delete all the pods belonging to that instance:

 kubectl delete pod <postgres-pod-name> -n <namespace>