Some pods get stuck in "Terminating" state during Postgres Operator upgrade from 3.0.0 to 4.2.4
search cancel

Some pods get stuck in "Terminating" state during Postgres Operator upgrade from 3.0.0 to 4.2.4

book

Article ID: 432266

calendar_today

Updated On:

Products

VMware Tanzu Data Services Solutions VMware Tanzu for Postgres

Issue/Introduction

During Postgres Operator upgrade (for example, from 3.0.0 to 4.2.4), some pods may remain stuck in a "Terminating" state while being replaced as part of the upgrade process.

This behavior is typically observed during rolling updates where old pods are terminated and new pods are created.

Symptoms

  • Pods remain in Terminating state for an extended period.
  • PostgreSQL inside the container has already shut down cleanly, as confirmed by logs.
  • Readiness probe failures may still appear as Kubernetes attempts to interact with a container that has already stopped.
  • Pod deletion does not complete automatically.

Cause

This issue is not related to PostgreSQL itself.

After PostgreSQL shuts down successfully, Kubernetes must complete additional cleanup operations before removing the pod object, including:

  • Unmounting volumes from the node
  • Detaching volumes via the CSI driver (e.g., vSphere CNS CSI)
  • Cleaning up container runtime resources
  • Finalizing Kubernetes object lifecycle

If any of these steps are delayed, particularly volume detach operations managed by the CSI driver, the pod may remain stuck in the Terminating state even though the database process has already exited.

This behavior is more likely during upgrades due to:

  • Multiple pods being restarted simultaneously
  • Increased volume attach/detach operations
  • Timing overlaps in Kubernetes scheduling and storage handling

Resolution

Workaround:

If PostgreSQL has already shut down cleanly and the pod remains stuck in Terminating state, it is generally safe to force delete the pod:

kubectl delete pod <pod-name> --grace-period=0 --force
 
Note:
  • This action only removes the Kubernetes pod object.
  • The underlying PVCs and data are not affected.
  • Volume detach operations will continue in the background if still in progress.
  • This should only be done after confirming that PostgreSQL has fully stopped.