Continuous Worker Node rollouts on vSphere with Tanzu Guest Cluster

Products

VMware vSphere ESXi VMware vSphere with Tanzu

Issue/Introduction

Symptoms:

On a vSphere with Tanzu Guest Cluster, worker nodes are continuously redeployed
A new MachineSet is created under the MachineDeployment associated with 1 or more NodePools created under the Guest Cluster, leading to new node deployments
This rollout behavior may cause workloads to fail due to the repeated failover
From an SSH to the Supervisor Cluster, the following symptoms will be present:
- The Guest Cluster TKC object will not change from READY:TRUE state

kubectl get tkc -A
NAMESPACE NAME CONTROL PLANE WORKER TKR NAME AGE READY TKR COMPATIBLE UPDATES AVAILABLE
test-namespace test-cluster 1 5 v1.23.8---vmware.2-tkg.2-zshippable 83m True True

When describing the TKC object, the events will show repeated PhaseChanged from Updating to Running and back:

kubectl describe tkc -n test-namespace test-cluster |tail -10
Phase: updating
Total Worker Replicas: 5
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal PhaseChanged 8m (x2 over 10m) vmware-system-tkg/vmware-system-tkg-controller-manager/tanzukubernetescluster-status-controller cluster changes from running phase to updating phase
Normal PhaseChanged 5m (x2 over 10m) vmware-system-tkg/vmware-system-tkg-controller-manager/tanzukubernetescluster-status-controller cluster changes from updating phase to running phase
Normal PhaseChanged 2m vmware-system-tkg/vmware-system-tkg-controller-manager/tanzukubernetescluster-status-controller cluster changes from running phase to updating phase

Getting MachineSets from the namespace will show repeated newly created MS's for the problem cluster and nodepool:

kubectl get machineset -A
NAMESPACE NAME CLUSTER REPLICAS READY AVAILABLE AGE VERSION
test-namespace test-cluster-node-pool-1-dxx9p-5c7bc9f65 test-cluster 0 39m v1.23.8+vmware.2
test-namespace test-cluster-node-pool-1-dxx9p-6666b54746 test-cluster 1 98s v1.23.8+vmware.2
test-namespace test-cluster-node-pool-1-dxx9p-66699868f9 test-cluster 1 1 1 14m v1.23.8+vmware.2
test-namespace test-cluster-node-pool-1-dxx9p-6874966dbd test-cluster 0 0 0 38m v1.23.8+vmware.2
test-namespace test-cluster-node-pool-1-dxx9p-6c87d8d4b test-cluster 0 15m v1.23.8+vmware.2
test-namespace test-cluster-node-pool-1-dxx9p-746bd5bfb4 test-cluster 0 8m35s v1.23.8+vmware.2
test-namespace test-cluster-node-pool-1-dxx9p-77587446bf test-cluster 1 1 1 7m16s v1.23.8+vmware.2
test-namespace test-cluster-node-pool-1-dxx9p-7b5f868d8f test-cluster 0 2m48s v1.23.8+vmware.2
test-namespace test-cluster-node-pool-2-lqtpm-57d98cc8d6 test-cluster 3 3 3 79m v1.23.8+vmware.2

The transitions will be noted in the capi-controller-manager logs. The update event will report repeated creation and rotation of the KubeadmConfigTemplate for the MachineDeployment:

I0427 21:20:29.753835 1 reconcile_state.go:455] controller/topology/cluster "msg"="Rotating KubeadmConfigTemplate/ test-cluster-node-pool-1-bootstrap-hx2nz, new name test-cluster-node-pool-1-bootstrap-q76sh" "machineDeployment name"=" test-cluster-node-pool-1-dxx9p" "machineDeployment topologyName"="node-pool-1" "name"=" test-cluster" "namespace"="test-namespace" "object"=" test-cluster-node-pool-1-bootstrap-hx2nz" "object groupVersion"="bootstrap.cluster.x-k8s.io/v1beta1" "object kind"="KubeadmConfigTemplate" "reconciler group"="cluster.x-k8s.io" "reconciler kind"="Cluster"
I0427 21:20:29.753962 1 reconcile_state.go:455] controller/topology/cluster "msg"="Creating KubeadmConfigTemplate/test-cluster-node-pool-1-bootstrap-q76sh" "machineDeployment name"=" test-cluster-node-pool-1-dxx9p" "machineDeployment topologyName"="node-pool-1" "name"=" test-cluster" "namespace"="test-namespace" "object"=" test-cluster-node-pool-1-bootstrap-hx2nz" "object groupVersion"="bootstrap.cluster.x-k8s.io/v1beta1" "object kind"="KubeadmConfigTemplate" "reconciler group"="cluster.x-k8s.io" "reconciler kind"="Cluster"

If the environment is using vSphere 8.x and an associated Supervisor Cluster, describing the Cluster object will indicate a large Generation number, this number will be larger the longer the condition has been present:

kubectl describe cluster -n test-namespace test-cluster | grep Generation
Generation: 68

Environment

VMware vSphere 7.0 with Tanzu
VMware vSphere 8.0 with Tanzu

Cause

Having more than 1 Label in a NodePool on a vSphere with Tanzu Guest Cluster leads to a condition where the CAPI controller manager operator continuously re-orders the label, leading to MachineSet spawning and associated worker node rollouts to reconcile the new KubeadmConfigTemplate that is created.

Resolution

This is resolved in vCenter version 8.0U1c and later versions.

Please note: Upgrading solely the vCenter will not resolve this issue. The Supervisor Cluster must subsequently be upgraded to pick up the resolution as well after vCenter upgrade.

Workaround:
If unable to upgrade vCenter and Supervisor Cluster, the only workaround is to edit the vSphere with Tanzu Guest Cluster and remove the second Label associated with the NodePool that is continuously updating.