After upgrade from vSphere 7.X to vSphere 8.X, a vSphere Kubernetes Cluster using an Ubuntu TKR is failing in Ready False state.
The Supervisor cluster upgrade completed successfully.
While connected to the Supervisor cluster context, the following symptoms are observed:
kubectl get tkc <affected cluster name> -n <affected cluster namespace>
NAMESPACE NAME CONTROL PLANE WORKER TKR NAME
my-cluster-ns my-ubuntu-cluster # # vX.XX.XX---vmware-X-fips-X.tkg.X.ubuntu
kubectl get machine -n <affected cluster namespace>
NAMESPACE NAME CLUSTER VERSION
my-cluster-ns my-cluster-cp-a1 my-ubuntu-cluster vX.XX.XX--vmware.X-fips.X
kubectl describe tkc <affected cluster name> -n <affected cluster namespace>
annotations:
run.tanzu.vmware.com/resolve-os-image: os-name=ubuntu
/usr/lib/vmware-wcp/upgrade/upgrade-ctl.py get-status | jq '.progress | to_entries | .[] | "\(.value.status) - \(.key)"' | sort
vSphere 8.0 with Tanzu
This issue can occur regardless of whether or not this cluster is managed by TMC.
The following annotation is needed for vSphere Kubernetes Clusters running on a TKR using the ubuntu OS:
run.tanzu.vmware.com/resolve-os-image: os-name=ubuntu
By default and without this annotation, nodes are deployed on the Photon OS.
It is expected for this annotation to be automatically patched by the Tanzu Kubernetes Cluster controllers during the CAPW to CAPV object migration starting in vSphere 8.x and higher. However in this scenario, the patching has failed to add this annotation.
This can occur if Kubernetes services were not healthy during the above noted CAPW to CAPV object migration.
This may also occur when the affected vSphere Kubernetes cluster TKC upgrade is initiated prior to the completion of CAPW to CAPV object migration during the Supervisor cluster upgrade after the vSphere 8.x or higher upgrade.
The ubuntu OS annotation needs to be re-applied to the affected vSphere Kubernetes Cluster TKC object.
kubectl edit tkc <affected cluster name> -n <affected cluster namespace>
run.tanzu.vmware.com/resolve-os-image: os-name=ubuntu
kubectl describe tkc <affected cluster name> -n <affected cluster namespace>
run.tanzu.vmware.com/resolve-os-image: os-name=ubuntu
kubectl describe cluster <affected cluster name> -n <affected cluster namespace>
annotations:
run.tanzu.vmware.com/tkr: <EXPECTED TKR VERSION>
kubectl get machine,vm,vspheremachine -n <affected cluster namespace>