Tanzu Guest Cluster Upgrade Stuck in "Updating" Phase – No New Machines Created - missing osimage
search cancel

Tanzu Guest Cluster Upgrade Stuck in "Updating" Phase – No New Machines Created - missing osimage

book

Article ID: 405990

calendar_today

Updated On:

Products

VMware vSphere Kubernetes Service

Issue/Introduction

Guest cluster upgrade has become stuck

kubectl get machine -n <namespace> will show that the cluster still has the older TKr version associated.

NAME                                            CLUSTER    NODENAME                                        PROVIDERID                                       PHASE     AGE   VERSION

<clustername>-control-plane-55xpm                    <clustername>  <clustername>-control-plane-55xpm                    vsphere://420ae78a-e929-54ad-d303-95ce68025bf4   Running   14h   v1.27.10+vmware.1-fips.1

<clustername>-control-plane-h7zcs                    <clustername>  <clustername>-control-plane-h7zcs                    vsphere://420a2a3a-6942-0dcc-9634-d7449c070842   Running   14h   v1.27.10+vmware.1-fips.1

<clustername>-control-plane-lvkc9                    <clustername>   <clustername>-control-plane-lvkc9                    vsphere://420a94a0-14fa-fd33-b1ac-0fd959ccaa99   Running   14h   v1.27.10+vmware.1-fips.1

<clustername>-workers-8l2pm-764bd7b9fcx8wb88-chmm4   <clustername>   <clustername>-workers-8l2pm-764bd7b9fcx8wb88-chmm4   vsphere://420a856c-63a6-bbe4-e68b-d8e95d4336af   Running   14h   v1.27.10+vmware.1-fips.1

<clustername>-workers-8l2pm-764bd7b9fcx8wb88-f4b6b   <clustername>   <clustername>-workers-8l2pm-764bd7b9fcx8wb88-f4b6b   vsphere://420a19e1-affb-b3cb-cd54-95d4baeaab95   Running   14h   v1.27.10+vmware.1-fips.1

<clustername>-workers-8l2pm-764bd7b9fcx8wb88-hsrrh   <clustername>   <clustername>-workers-8l2pm-764bd7b9fcx8wb88-hsrrh   vsphere://420ae54d-4dab-4c41-894c-9700b5d8ebc5   Running   14h   v1.27.10+vmware.1-fips.1

 

Environment

TKC v1.27.10

Cause

'kubectl osimage' did not show the TKr v1.28.8 for photon & the vmware-system-tkg-controller-manager logs confirmed requirement for the photon image to upgrade:

kubectl logs -n vmware-system-tkg vmware-system-tkg-controller-manager-84887d8f75-f7xr2

E0730 09:56:30.695139       1 tanzukubernetescluster_controller.go:468] vmware-system-tkg-controller-manager/tanzukubernetescluster-spec-controller/<clustername>-ns/<clustername> "msg"="Error while reconcilling cluster object requeuing for retry" "error"="admission webhook \"tkr-resolver-cluster-webhook.tanzu.vmware.com\" denied the request: could not resolve TKR/OSImage for controlPlane, machineDeployments: [workers], query: {controlPlane: {k8sVersionPrefix: 'v1.28.8+vmware.1-fips.1-tkg.2', tkrSelector: '!run.tanzu.vmware.com/legacy-tkr,tkr.tanzu.vmware.com/standard', osImageSelector: 'os-name=photon,tkr.tanzu.vmware.com/standard'}, machineDeployments: [{k8sVersionPrefix: 'v1.28.8+vmware.1-fips.1-tkg.2', tkrSelector: '!run.tanzu.vmware.com/legacy-tkr,tkr.tanzu.vmware.com/standard', osImageSelector: 'os-name=photon'}]}, result: {controlPlane: {k8sVersion: '', tkrName: '', osImagesByTKR: map[]}, machineDeployments: [{k8sVersion: '', tkrName: '', osImagesByTKR: map[]}]}" "cluster.name"="<clustername>"

 

'kubectl get osimage | grep 1.28.8' also only shows that the osimage for 1.28.8 available currently was ubuntu only.

vmi-10bab5091f3b5c924   v1.28.8+vmware.1-fips.1    ubuntu    22.04        amd64   vmi                 36h

 

Resolution

Solution Recommendation

To follow the below steps and automatically repopulate osimages.
Following below steps the upgrade will complete as it will be able to find the osimage v1.28.8 for photon and continue with upgrade.

 

Step-by-step Instructions

1. Connect to the Supervisor Cluster context in your Kubernetes CLI.
2. Locate the affected TKR version:
 - Run the following to confirm the TKR exists:
 - kubectl get tkr | grep v1.28.8
3. Delete the existing TKR:
 - kubectl delete tkr v1.28.8+vmware.1-fips.1-tkg.2
 - (Update the version string as needed to match your environment)
4. Wait a few minutes for the TKR to be automatically re-created.
 - Then run:
 - kubectl get tkr | grep v1.28.8
 - kubectl get tkr <tkr> -o yaml
 - to confirm it has returned.
5. Verify that all expected osimage entries are now present:
 - kubectl get osimage | grep v1.28.8
 - You should now see entries for both Photon and Ubuntu VMIs.