Tanzu guest control plane stuck in deleting nodes during upgrade

Products

VMware vSphere with Tanzu vSphere with Tanzu

Issue/Introduction

Upgrading a Guest Cluster from v1.25.7---vmware.3-fips.1-tkg.1 to v1.26.5---vmware.2-fips.1-tkg.1, control plane of the Guest Cluster Started cycling as expected. New 3 control planes had been created, and one of the old control planes had been deleted. The other two old control planes are in a not Ready state:

From Guest Cluster CP:

# kubectl get nodes

NAME STATUS ROLES AGE VERSION
node-xyz-7h59v NotReady,SchedulingDisabled control-plane 47d v1.25.7+vmware.3-fips.1
node-xyz-9wmgd Ready control-plane 53m v1.26.5+vmware.2-fips.1
node-xyz-qp8bd Ready control-plane 42m v1.26.5+vmware.2-fips.1
node-xyz-vk2w6 NotReady,SchedulingDisabled control-plane 47d v1.25.7+vmware.3-fips.1
node-xyz-vpn6l Ready control-plane 46m v1.26.5+vmware.2-fips.1
node-shared-xxxxx-xxxxxxxx-r2hrj Ready <none> 47d v1.25.7+vmware.3-fips.1
node-shared-xxxxx-xxxxxxxx-xmqg6 Ready <none> 47d v1.25.7+vmware.3-fips.1
node-shared-xxxxx-xxxxxxxx-z8hnt Ready <none> 47d v1.25.7+vmware.3-fips.1
node-shared-xxxxx-xxxxxxxx-255vf Ready <none> 38m v1.26.5+vmware.2-fips.1
node-shared-xxxxx-xxxxxxxx-mf24x Ready <none> 2m3s v1.26.5+vmware.2-fips.1
node-shared-xxxxx-xxxxxxxx-w5hln Ready <none> 14m v1.26.5+vmware.2-fips.1

So, nodes The new recent created nodes are:

node-xyz-9wmgd Ready control-plane 53m v1.26.5+vmware.2-fips.1
node-xyz-qp8bd Ready control-plane 42m v1.26.5+vmware.2-fips.1
node-xyz-vpn6l Ready control-plane 46m v1.26.5+vmware.2-fips.1

And the old nodes are:

node-xyz-7h59v NotReady,SchedulingDisabled control-plane 47d v1.25.7+vmware.3-fips.1
node-xyz-vk2w6 NotReady,SchedulingDisabled control-plane 47d v1.25.7+vmware.3-fips.1

Resolution

First thing you need to make sure that old nodes are not present on the ETCD cluster, otherwise don't delete them

open ssh session in one of the Guest Cluster Control Plane nodes and query etcd cluster if there any of the old nodes are there:

Once you are completelly sure that non of the NotReady status nodes are not on the ETCD cluster, you can proceed to delete them:

kubectl delete node node-xyz-7h59v

kubectl delete node node-xyz-vk2w6

Wait about 10 minutes, and tkc cluster should be in a ready state again.