TKG cluster upgrade fails and newly created CP node has nodename set to localhost
search cancel

TKG cluster upgrade fails and newly created CP node has nodename set to localhost

book

Article ID: 375218

calendar_today

Updated On:

Products

VMware Tanzu Kubernetes Grid Management Tanzu Kubernetes Grid VMware Tanzu Kubernetes Grid VMware Tanzu Kubernetes Grid 1.x VMware Tanzu Kubernetes Grid Plus VMware Tanzu Kubernetes Grid Plus 1.x

Issue/Introduction

TKGm workload cluster upgrade fails, first Control Plane node is created and added to the cluster. However the second CP machine object and VM is created but node is not added to the cluster.

The nodename in node and machine objects of first Control Plane node are set to localhost. The hostname of the VM is also set to localhost.

 

Environment

Any TKGm version

Cause

Nodenames in a cluster must be unique so the second CP node will not be added to the cluster with "localhost" nodename.

The hostname and nodename should not be set to localhost and this indicates that an invalid OS image is being used.

There may be multiple OS images with the same image version in vCenter and the wrong ones is being picked up.

 

Retrieve OS image version

kubectl get osimage <OS Image name> -o jsonpath='{.spec.image.ref.version}'

Sample:

kubectl get osimage v1.28.7---vmware.1-tkg.3-50fb7614ebf10b4a98fbb31220ac0fb1 -o jsonpath='{.spec.image.ref.version}'

v1.28.7+vmware.1-tkg.3-50fb7614ebf10b4a98fbb31220ac0fb1

 

Find the images in vCentre with for corresponding kubernetes version

govc find /<Datacentre name> -type m | grep <kubernetes version>

Example:

govc find /<DATACENTER> -type m | grep 1.28
...
/<DATACENTER>/vm/tkg/photon-5-kube-v1-28-7+vmware-1-tkg-3-50fb7614ebf10b4a98fbb31220ac0fb1

...

Check the version of the images. Search for "Id": "VERSION" and check "DefaultValue"

govc vm.info -json <full path to image> | jq 

Sample output:

            {
              "Key": 10,
              "ClassId": "",
              "InstanceId": "",
              "Id": "VERSION",
              "Category": "Cluster API Provider (CAPI)",
              "Label": "VERSION",
              "Type": "string",
              "TypeReference": "",
              "UserConfigurable": false,
              "DefaultValue": "v1.28.7+vmware.1-tkg.3-50fb7614ebf10b4a98fbb31220ac0fb1",
              "Value": "",
              "Description": ""
            }

Resolution

If there are multiple images with same version in vCentre, remove the invalid one.

Alternatively remove all the relevant images for the particular kubernetes version,  download valid one from Broadcom Support Portal and upload to vCenter.

Rerun cluster upgrade.