TKGS Upgrade leaves stale node entry inside TKGS Cluster

search cancel

TKGS Upgrade leaves stale node entry inside TKGS Cluster

book

Article ID: 323446

calendar_today

Updated On:

Products

VMware vCenter Server

Issue/Introduction

Symptoms:
This issue ONLY relates to the following vCenter Releases

vCenter Server 7.0.0d (7.0.0.10700)	2020-08-25	16749653	16749670
vCenter Server 7.0.0c (7.0.0.10600)	2020-07-30	16620007	16620013
vCenter Server 7.0.0b (7.0.0.10400)	2020-06-23	16386292	16386335
vCenter Server 7.0.0a (7.0.0.10300)	2020-05-19	16189094	16189207
vCenter Server 7.0 GA (7.0.0.10100)	2020-04-02	15952498	15952599

After a TKGS Upgrade, the VM has been deleted and all controllers are in a ready state except for the old controller node.

kubectl get nodes
NAME                                    STATUS     ROLES    AGE     VERSION
tkc-control-plane-zmvqq            NotReady   master   6d16h   v1.16.8+vmware.1
tkc-control-plane-zmvqq-10656602   Ready      master   34m     v1.17.7+vmware.1
tkc-workers-bfdl2-99c66bdb-dfc47   Ready      <none>   25m     v1.17.7+vmware.1

From the Supervisor Cluster you validated that the vm object has been deleted and the VM object in vCenter has also been deleted.

Environment

VMware vCenter Server 7.0.x

Cause

This is a known issue with the upgrade job on these versions of vSphere.

Resolution

No resolution. Only workaround until you are able to upgrade vCenter/Supervisor Cluster past vSphere 7.0 U1.

Workaround:
From within the guest cluster you can run

kubectl delete node <node-name>

to clear up the stale node.

In the above example the command would be...

kubectl delete node tkc-control-plane-zmvqq

Additional Information

Impact/Risks:
In some cases new workers added to this cluster will use the ip address of the stale object which will cause pods on the new worker to fail. You can validate if there is a new worker with the same ip address as the stale node by running

kubectl get nodes -o wide

Feedback

thumb_up Yes

thumb_down No