Upgrading Postgres database cluster to a minor version may show in 'Upgrading' status with older DB version
search cancel

Upgrading Postgres database cluster to a minor version may show in 'Upgrading' status with older DB version

book

Article ID: 433553

calendar_today

Updated On:

Products

VMware Data Services Manager for VCF

Issue/Introduction

Symptoms:

  • On a DSM environment, when upgrading Postgres database on secondary clusters may stuck at 'Upgrading' state with older version of database version.

  • The issue is seen only with Secondary cluster with backup configured.

Example: Upgrade from Postgres 17.5 to 17.7 is stuck at this state for a longer period. 

Environment

  • VMware Data Services Manager 9.0.2

  • VCFA 9.0.2

  • VKS 3.5

  • Secondary clusters with backups configured.

Cause

In DSM 9.1 and earlier, in a DR setup, performing a minor version upgrade on the secondary Postgres cluster when backups are configured may cause the upgrade to become stuck in progress. The upgrade completes at the workload level, meaning the database is running and usable on the new version, but the cluster status in DSM does not reconcile and continues to show "Upgrading." This prevents further control plane operations on this cluster.

Example: upgrading from Postgres 17.5 to 17.7.

This can be validated from the provisioner logs from the DSM support bundle as below path from the collected log bundle.. 

dsm-supportbundle-1773046151797-2026-03-09-084956.tgz_extracted/dsm-supportbundle-1773046151797-2026-03-09-084956/control-plane.tar.gz_extracted/provider/containers/dsm-tsql-provisioner-service.log

2026-03-09T06:04:02.526Z ERROR dsm-provisioner 1 [dsm@4413 host="01d7b55692c6" caller="provision/tsql_controller.go:758" controller="postgrescluster" controllerGroup="databases.dataservices.vmware.com" controllerKind="PostgresCluster" PostgresCluster="dsm-managed/postgres-test-sec" namespace="dsm-managed" name="postgres-test-sec" reconcileID="078a0bd2-fe8f-47aa-95e0-8e9f982a963d" objName="postgres-test-sec" objType="databases.DBClusterPG" objNs="dsm-managed" "error"=""could not get conditions for Postgres BackupSchedule: default-incremental-backup"" "stacktrace"=""goroutine 2396 [running]:\nruntime/debug.Stack()\n\t/build/mts/release/bora-25094484/home/mts/go/pkg/mod/golang.org/[email protected]/src/runtime/debug/stack.go:26 +0x5e\ngithub-vcf.devops.broadcom.net/vcf/dsm-control-plane/common-go/pkg/logging.(*vcfLoggerSink).Error(0xc003500640, {0x5261d80?, 0xc001fd8000?}, {0xc002697e00, 0x50}, {0x0?, 0xc004e4d808?, 0x92615d?})\n\t/build/mts/release/bora-25094484/dsm-control-plane/common-go/pkg/logging/vcflogger.go:92 +0x13a\ngithub.com/go-logr/logr.Logger.Error({{0x52b0fd0?, 0xc003500640?}, 0x0?}, {0x5261d80, 0xc001fd8000}, {0xc002697e00, 0x50}, {0x0, 0x0, 0x0})\n\t/build/mts/release/bora-25094484/home/mts/go/pkg/mod/github.com/go-logr/[email protected]/logr.go:301 +0x145\ngithub-vcf.devops.broadcom.net/vcf/dsm-control-plane/provisioner/pkg/databases/controllers/provision.(*TSQLRemediationHandler).remediateDatabase(0xc001062750, {0x52a5998, 0xc003f76300}, {0x52b8bd8, 0xc003f76420}, {0x52b1018, 0xc00340f860}, {0x52a6ae8, 0xc00587f900}, {0x5308be0, ...})\n\t/build/mts/release/bora-25094484/dsm-control-plane/provisioner/pkg/databases/controllers/provision/tsql_controller.go:758 +0xac\ngithub-vcf.devops.broadcom.net/vcf/dsm-control-plane/provisioner/pkg/databases/controllers/provision.(*TSQLRemediationHandler).Reconcile(0xc001062750, {0x52a5998, 0xc006de54a0}, {{{0xc0068cbef0, 0xb}, {0xc0042becc0, 0x11}}})\n\t/build/mts/release/bora-25094484/dsm-control-plane/provisioner/pkg/databases/controllers/provision/tsql_controller.go:381 +0x1e6e\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Reconcile(0xc0035ba640?, {0x52a5998?, 0xc006de54a0?}, {{{0xc0068cbef0?, 0x0?}, {0xc0042becc0?, 0x0?}}})\n\t/build/mts/release/bora-25094484/home/mts/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:119 +0xbf\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).reconcileHandler(0x52eb2e0, {0x52a59d0, 0xc000ad39a0}, {{{0xc0068cbef0, 0xb}, {0xc0042becc0, 0x11}}}, 0x0)\n\t/build/mts/release/bora-25094484/home/mts/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:334 +0x3ad\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).processNextWorkItem(0x52eb2e0, {0x52a59d0, 0xc000ad39a0})\n\t/build/mts/release/bora-25094484/home/mts/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:294 +0x21b\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Start.func2.2()\n\t/build/mts/release/bora-25094484/home/mts/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:255 +0x85\ncreated by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Start.func2 in goroutine 853\n\t/build/mts/release/bora-25094484/home/mts/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:251 +0x6b5\n""] "could not get conditions for Postgres BackupSchedule: default-incremental-backup"

2026-03-09T08:25:05.199878+00:00 z1vcf-md01-dsm01 container_name/dsm-tsql-provisioner-service[1491]: 2026-03-09T08:25:05.199Z INFO dsm-provisioner 1 [dsm@4413 host="01d7b55692c6" caller="provisioners/helpers.go:128" controller="postgrescluster" controllerGroup="databases.dataservices.vmware.com" controllerKind="PostgresCluster" PostgresCluster="dsm-managed/postgres-test-sec" namespace="dsm-managed" name="postgres-test-sec" reconcileID="7b415cb5-42b7-438c-8ae8-cb2c27aa7d83" objName="postgres-test-sec" objType="databases.DBClusterPG" objNs="dsm-managed" namespace="dsm-managed" databaseName="postgres-test-sec"] "status conditions are not present on target CR"

Resolution

Broadcom team has observed a known issue and engineering team is working on the fix.

There are two remediation options as workaround. 

  • The first one is to update status.upgradeStatus.currentVersion of the affected cluster through the DSM API so that it matches the cluster’s spec.version. This will transition the cluster's status to Ready. 

  • Alternatively, discard the secondary cluster and bootstrap a new secondary cluster on the new version.