Tanzu guest control plane stuck in deleting nodes during upgrade
search cancel

Tanzu guest control plane stuck in deleting nodes during upgrade

book

Article ID: 379378

calendar_today

Updated On:

Products

VMware vSphere with Tanzu vSphere with Tanzu

Issue/Introduction

Upgrading a Guest Cluster from v1.25.7---vmware.3-fips.1-tkg.1 to v1.26.5---vmware.2-fips.1-tkg.1, control plane of the Guest Cluster Started cycling as expected. New 3 control planes had been created, and one of the old control planes had been deleted. The other two old control planes are in a not Ready state:

From Guest Cluster CP:

# kubectl get nodes

NAME                                             STATUS                        ROLES           AGE    VERSION
node-xyz-7h59v                     NotReady,SchedulingDisabled   control-plane   47d    v1.25.7+vmware.3-fips.1
node-xyz-9wmgd                     Ready                         control-plane   53m    v1.26.5+vmware.2-fips.1
node-xyz-qp8bd                     Ready                         control-plane   42m    v1.26.5+vmware.2-fips.1
node-xyz-vk2w6                     NotReady,SchedulingDisabled   control-plane   47d    v1.25.7+vmware.3-fips.1
node-xyz-vpn6l                     Ready                         control-plane   46m    v1.26.5+vmware.2-fips.1
node-shared-xxxxx-xxxxxxxx-r2hrj   Ready                         <none>          47d    v1.25.7+vmware.3-fips.1
node-shared-xxxxx-xxxxxxxx-xmqg6   Ready                         <none>          47d    v1.25.7+vmware.3-fips.1
node-shared-xxxxx-xxxxxxxx-z8hnt   Ready                         <none>          47d    v1.25.7+vmware.3-fips.1
node-shared-xxxxx-xxxxxxxx-255vf   Ready                         <none>          38m    v1.26.5+vmware.2-fips.1
node-shared-xxxxx-xxxxxxxx-mf24x   Ready                         <none>          2m3s   v1.26.5+vmware.2-fips.1
node-shared-xxxxx-xxxxxxxx-w5hln   Ready                         <none>          14m    v1.26.5+vmware.2-fips.1

 

So, nodes The new recent created nodes are:

node-xyz-9wmgd                     Ready                         control-plane   53m    v1.26.5+vmware.2-fips.1
node-xyz-qp8bd                     Ready                         control-plane   42m    v1.26.5+vmware.2-fips.1
node-xyz-vpn6l                     Ready                         control-plane   46m    v1.26.5+vmware.2-fips.1

And the old nodes are:

node-xyz-7h59v                     NotReady,SchedulingDisabled   control-plane   47d    v1.25.7+vmware.3-fips.1
node-xyz-vk2w6                     NotReady,SchedulingDisabled   control-plane   47d    v1.25.7+vmware.3-fips.1

 

Resolution

First thing you need to make sure that old nodes are not present on the ETCD cluster, otherwise don't delete them

open ssh session in one of the Guest Cluster Control Plane nodes and query etcd cluster if there any of the old nodes are there:

# etcdctl -w table member list
+------------------+---------+------------------------+------------------------------+------------------------------+------------+
|        ID        | STATUS  |          NAME          |          PEER ADDRS          |         CLIENT ADDRS         | IS LEARNER |
+------------------+---------+------------------------+------------------------------+------------------------------+------------+
| 11111111111111111| started | node-xyz-9wmgd      | https://x.x.x.x:2380         | https://x.x.x.x:2379         |      false |
| 22222222222222222| started | node-xyz-qp8bd      | https://x.x.x.x:2380         | https://x.x.x.x:2379         |      false |
| 33333333333333333| started | node-xyz-vpn6l      | https://x.x.x.x:2380         | https://x.x.x.x:2379         |      false |
+------------------+---------+------------------------+------------------------------+------------------------------+------------+

Once you are completelly sure that non of the NotReady status nodes are not on the ETCD cluster, you can proceed to delete them:

kubectl delete node node-xyz-7h59v 

kubectl delete node node-xyz-vk2w6

 

Wait about 10 minutes, and tkc cluster should be in a ready state again.