According to the release notes, Upgrading the VKS cluster to VKr v1.29.4 from v1.28.15 is not supported. This would result in a back in time upgrade for some of the packages and the fixes available in v1.28.15 patch will not be available in 1.29.4.
If you have already initiated an upgrade from v1.28.15 to v1.29.4 but did not manually unpause the cluster, follow vSphere 8.0 Supervisor Workload Cluster Upgrade Stuck with No Nodes on Desired Upgraded Version to revert the cluster back to v1.28.15.
This KB is to cover the scenario where a user manually unpaused the cluster, which leads to a v1.29.4 control plane rollout that gets stuck in NotReady status.
While connected to the Supervisor cluster context, one or more of the following symptoms are observed:
kubectl get machines -n <cluster namespace>
The worker nodes are still on the old version of v1.28.15
kubectl get clusterbootstrap -n <cluster namespace>
antrea.tanzu.vmware.com.1.13.3+vmware.3-tkg.2-vmware vsphere-pv-csi.tanzu.vmware.com.3.1.0+vmware.1-tkg.6-vmware vsphere-cpi.tanzu.vmware.com.1.28.0+vmware.1-tkg.2-vmware kapp-controller.tanzu.vmware.com.0.50.0+vmware.2-tkg.1-vmware v1.28.15---vmware.3-fips-vkr.3
ClusterBootstrap.run.tanzu.vmware.com <cluster name> is invalid: spec.kapp.refName: Invalid value: \"kapp-controller.tanzu.vmware.com.0.50.0+vmware.1-tkg.1-vmware\": package downgrade is not allowed, original version: 0.50.0+vmware.2-tkg.1-vmware, updated version 0.50.0+vmware.1-tkg.1-vmware
status:
conditions:
- message: Error (see .status.usefulErrorMessage for details)
status: "True"
type: ReconcileFailed
friendlyDescription: 'Reconcile failed: Error (see .status.usefulErrorMessage for
details)'
lastAttemptedVersion: 0.50.0+vmware.2-tkg.1-vmware
observedGeneration: 2
usefulErrorMessage: |-
Stopped installing matched version '0.50.0+vmware.1-tkg.1-vmware' since last attempted version '0.50.0+vmware.2-tkg.1-vmware' is higher.
hint: Add annotation packaging.carvel.dev/downgradable: "" to PackageInstall to proceed with downgrade
version: 0.50.0+vmware.1-tkg.1-vmware
VKS v3.3.2 and below
Upgrading VKS Cluster to v1.29.4 effectively results in downgrading of the kapp-controller package, because the previous version (v1.28.15) was released after v1.29.4.
During the upgrade to v1.29.4, the addon-manager paused the cluster to upgrade the addons but failed after detecting the version downgrade. As a result, the cluster remains in a paused state.
If the cluster is manually unpaused, it causes the upgrade of the cluster to proceed with v1.29.4 without upgrading the corresponding kapp-controller addons, leading to a failure since the non-upgraded kapp-controller addon is incompatible with the upgraded node(s).
Manually unpausing a cluster is an unsupported action.
Reach out to VMware by Broadcom Technical Support for assistance and reference this KB article.
This type of “back-in-time upgrade” should not occur in VKS 3.3.3 or later, as a webhook constraint was introduced to prevent it.