Guest cluster upgrade has become stuck
kubectl get machine -n <namespace> will show that the cluster still has the older TKr version associated.
NAME CLUSTER NODENAME PROVIDERID PHASE AGE VERSION
<clustername>-control-plane-55xpm <clustername> <clustername>-control-plane-55xpm vsphere://420ae78a-e929-54ad-d303-95ce68025bf4 Running 14h v1.27.10+vmware.1-fips.1
<clustername>-control-plane-h7zcs <clustername> <clustername>-control-plane-h7zcs vsphere://420a2a3a-6942-0dcc-9634-d7449c070842 Running 14h v1.27.10+vmware.1-fips.1
<clustername>-control-plane-lvkc9 <clustername> <clustername>-control-plane-lvkc9 vsphere://420a94a0-14fa-fd33-b1ac-0fd959ccaa99 Running 14h v1.27.10+vmware.1-fips.1
<clustername>-workers-8l2pm-764bd7b9fcx8wb88-chmm4 <clustername> <clustername>-workers-8l2pm-764bd7b9fcx8wb88-chmm4 vsphere://420a856c-63a6-bbe4-e68b-d8e95d4336af Running 14h v1.27.10+vmware.1-fips.1
<clustername>-workers-8l2pm-764bd7b9fcx8wb88-f4b6b <clustername> <clustername>-workers-8l2pm-764bd7b9fcx8wb88-f4b6b vsphere://420a19e1-affb-b3cb-cd54-95d4baeaab95 Running 14h v1.27.10+vmware.1-fips.1
<clustername>-workers-8l2pm-764bd7b9fcx8wb88-hsrrh <clustername> <clustername>-workers-8l2pm-764bd7b9fcx8wb88-hsrrh vsphere://420ae54d-4dab-4c41-894c-9700b5d8ebc5 Running 14h v1.27.10+vmware.1-fips.1
TKC v1.27.10
'kubectl osimage' did not show the TKr v1.28.8 for photon & the vmware-system-tkg-controller-manager logs confirmed requirement for the photon image to upgrade:
kubectl logs -n vmware-system-tkg vmware-system-tkg-controller-manager-84887d8f75-f7xr2
E0730 09:56:30.695139 1 tanzukubernetescluster_controller.go:468] vmware-system-tkg-controller-manager/tanzukubernetescluster-spec-controller/<clustername>-ns/<clustername> "msg"="Error while reconcilling cluster object requeuing for retry" "error"="admission webhook \"tkr-resolver-cluster-webhook.tanzu.vmware.com\" denied the request: could not resolve TKR/OSImage for controlPlane, machineDeployments: [workers], query: {controlPlane: {k8sVersionPrefix: 'v1.28.8+vmware.1-fips.1-tkg.2', tkrSelector: '!run.tanzu.vmware.com/legacy-tkr,tkr.tanzu.vmware.com/standard', osImageSelector: 'os-name=photon,tkr.tanzu.vmware.com/standard'}, machineDeployments: [{k8sVersionPrefix: 'v1.28.8+vmware.1-fips.1-tkg.2', tkrSelector: '!run.tanzu.vmware.com/legacy-tkr,tkr.tanzu.vmware.com/standard', osImageSelector: 'os-name=photon'}]}, result: {controlPlane: {k8sVersion: '', tkrName: '', osImagesByTKR: map[]}, machineDeployments: [{k8sVersion: '', tkrName: '', osImagesByTKR: map[]}]}" "cluster.name"="<clustername>"
'kubectl get osimage | grep 1.28.8' also only shows that the osimage for 1.28.8 available currently was ubuntu only.
vmi-10bab5091f3b5c924 v1.28.8+vmware.1-fips.1 ubuntu 22.04 amd64 vmi 36h
Solution Recommendation
To follow the below steps and automatically repopulate osimages.
Following below steps the upgrade will complete as it will be able to find the osimage v1.28.8 for photon and continue with upgrade.
Step-by-step Instructions
1. Connect to the Supervisor Cluster context in your Kubernetes CLI.
2. Locate the affected TKR version:
- Run the following to confirm the TKR exists:
- kubectl get tkr | grep v1.28.8
3. Delete the existing TKR:
- kubectl delete tkr v1.28.8+vmware.1-fips.1-tkg.2
- (Update the version string as needed to match your environment)
4. Wait a few minutes for the TKR to be automatically re-created.
- Then run:
- kubectl get tkr | grep v1.28.8
- kubectl get tkr <tkr> -o yaml
- to confirm it has returned.
5. Verify that all expected osimage entries are now present:
- kubectl get osimage | grep v1.28.8
- You should now see entries for both Photon and Ubuntu VMIs.