TKGS Upgrade leaves stale node entry inside TKGS Cluster
search cancel

TKGS Upgrade leaves stale node entry inside TKGS Cluster

book

Article ID: 323446

calendar_today

Updated On:

Products

VMware vCenter Server

Issue/Introduction

Symptoms:
This issue ONLY relates to the following vCenter Releases 
vCenter Server 7.0.0d (7.0.0.10700)2020-08-251674965316749670
vCenter Server 7.0.0c (7.0.0.10600)2020-07-301662000716620013
vCenter Server 7.0.0b (7.0.0.10400)2020-06-2316386292 16386335
vCenter Server 7.0.0a (7.0.0.10300)2020-05-1916189094 16189207
vCenter Server 7.0 GA (7.0.0.10100)2020-04-0215952498 15952599


After a TKGS Upgrade, the VM has been deleted and all controllers are in a ready state except for the old controller node.
 
kubectl get nodes
NAME                                    STATUS     ROLES    AGE     VERSION
tkc-control-plane-zmvqq            NotReady   master   6d16h   v1.16.8+vmware.1
tkc-control-plane-zmvqq-10656602   Ready      master   34m     v1.17.7+vmware.1
tkc-workers-bfdl2-99c66bdb-dfc47   Ready      <none>   25m     v1.17.7+vmware.1

From the Supervisor Cluster you validated that the vm object has been deleted and the VM object in vCenter has also been deleted.

Environment

VMware vCenter Server 7.0.x

Cause

This is a known issue with the upgrade job on these versions of vSphere.

Resolution

No resolution. Only workaround until you are able to upgrade vCenter/Supervisor Cluster past vSphere 7.0 U1.

Workaround:
From within the guest cluster you can run

kubectl delete node <node-name> 

to clear up the stale node.

In the above example the command would be...

kubectl delete node tkc-control-plane-zmvqq

Additional Information

Impact/Risks:
In some cases new workers added to this cluster will use the ip address of the stale object which will cause pods on the new worker to fail. You can validate if there is a new worker with the same ip address as the stale node by running

kubectl get nodes -o wide