After initiating an upgrade from vSphere 7 to vSphere 8, the Supervisor cluster upgrade and manually initiated vSphere Kubernetes Cluster upgrade is stuck, not progressing.
While connected to the Supervisor cluster, the following symptoms are present:
kubectl describe tkc <tkc name> -n <cluster namespace>Labels:run.tanzu.vmware.com/migrate-tkc=Annotations:run.tanzu.vmware.com/tkc-upgrade-from:<TKR VERSION>
kubectl get tkc,machine,kcp,md -n <cluster namespace>
kubectl describe kcp <kcp name> -n <cluster namespace>
"Failed to get VirtualMachineImage <ob-virtualmachineimage-name>: VirtualMachineImage.vmoperator.vmware.com "<ob-virtualmachineimage-name>" not found' "
kubectl get cvmi -A
/usr/lib/vmware-wcp/upgrade/upgrade-ctl.py get-status | jq '.progress | to_entries | .[] | "\(.value.status) - \(.key)"' | sort
"failed - utkgClusterMigration" or "pending - utkgClusterMigration"
kubectl get wcpcluster,wcpmachine,wcpmachinetemplate -A
kubectl get pkgi -n <affected cluster namespace>
NAMESPACE NAME PACKAGE NAME PACKAGE VERSION DESCRIPTION
<cluster namespace> packageinstall.packaging.carvel.dev/my-cluster-kapp-controller kapp-controller.tanzu.vmware.com X.XX.X+vmware.X-tkg.X-vmware Reconcile failed: Error (see .status.usefulErrorMessage for details)
Depending on when the vSphere Kubernetes cluster was upgraded before the migration became stuck, the following symptoms may be present:
"failed to get controlplane TKR for TKC cluster from supervisor ... TKR v#.##.#---vmware.#" "parsing semantic version from string '': could not parse \"\" as version"
"Failed to run command kubectl label overwrite tkc for my-cluster, could not find spec.distribution.fullVersion; Component upgrade failed for v#.##.#"
VMware vSphere 8.0 with Tanzu
This issue can occur on vSphere Kubernetes cluster regardless of whether or not it is managed by Tanzu Mission Control (TMC)
Upgrades from vSphere 7 to vSphere 8 undergo a migration of wcp objects into vsphere objects and virtualmachineimages into clustervirtualmachineimages (cvmi).
The associated components are updated to reference the newly created objects for migration.
In addition to the above, the TKR naming conventions change and can lead to a TKR version mismatch when initiating the vSphere Kubernetes cluster upgrade before the migration has completed. This can also occur when the cluster upgrade is initiated from Tanzu Mission Control (TMC) as TMC pulls the TKR version data from the vSphere Kubernetes environment.
However if a vSphere Kubernetes cluster upgrade is started before the migration completes, both the Supervisor cluster upgrade and vSphere Kubernetes cluster upgrade become stuck. This is because the system is trying to prioritize completing the cluster upgrade but cannot due to cluster components which are still referencing pre-migration objects. These pre-migration objects were already replaced in the migration process.
When upgrading a vSphere Kubernetes cluster from a legacy TKR for vSphere 7 to a TKR for vSphere 8, the kapp-controller package is automatically installed on the vSphere Kubernetes cluster. However, the kapp-controller pkgi will be stuck in ReconcileFailed state until the cluster upgrade completes.
Documentation ("Upgrading from any vCenter Server release to any vCenter Server 8.x release"): https://techdocs.broadcom.com/us/en/vmware-cis/vsphere/vsphere-supervisor/8-0/using-tkg-service-with-vsphere-supervisor/updating-tkg-service-clusters/understanding-the-rolling-update-model-for-tkg-service-clusters.html
Please open a ticket to VMware by Broadcom support referencing this KB article for assistance in reverting the affected cluster's upgrade and completing the migration.
Once the migration is completed, the Supervisor upgrade will complete and the environment will stabilize where the vSphere Kubernetes cluster upgrade can be restarted to complete successfully.
Beginning in TKG Service 3.2.1 and 3.3.0, the system will prevent starting vSphere Kubernetes cluster upgrades before the vSphere 7 to vSphere 8 migration completes.
TKG Service Documentation: https://techdocs.broadcom.com/us/en/vmware-cis/vsphere/vsphere-supervisor/8-0/using-tkg-service-with-vsphere-supervisor/installing-and-upgrading-the-tkg-service.html