Note - These steps were tested in an environment running on vSphere infrastructure. However, similar steps could be applied for other infrastructures like AWS or Azure.
The objective of this article is to explain how to change the infrastructure specs of a running Tanzu Kubernetes grid cluster (can be a management cluster or a workload cluster)
How to reproduce this issue?
- First, we need to get the
machines
and vspheremachinetemplates
:
kubectl -n default get machines
NAME PROVIDERID PHASE VERSION
workload-slot59rp48-control-plane-kj5d6 vsphere://4237f106-5d51-0c92-1672-377819f0dec1 Running v1.21.2+vmware.1
workload-slot59rp48-md-0-6968c6cc98-55ckf vsphere://4237843d-9e7b-4689-cd54-de6a0d08cc98 Running v1.21.2+vmware.1
kubectl -n default get vspheremachinetemplates
NAME AGE
workload-slot59rp48-control-plane 44h
workload-slot59rp48-worker 44h
- Then you can make a copy of the "
vspheremachinetemplates
" to a YAML file as follow, for both the control plane and worker node:
Note - Making a copy of templates is required because existing vspheremachinetemplates are immutable and the kubectl edit command will produce an error if executed on existing templates.
kubectl -n default get vspheremachinetemplates workload-slot59rp48-control-plane -o yaml > control-plane-test-node.yaml
kubectl -n default get vspheremachinetemplates workload-slot59rp48-worker -oyaml > worker-test-node.yaml
- Edit the YAML files generated in the previous steps to change the name of the templates as well as introduce an incorrect networkName to forcefully generate an error. For example (some details are omitted for brevity):
apiVersion: infrastructure.cluster.x-k8s.io/v1alpha3
kind: VSphereMachineTemplate
metadata:
name: control-plane-test-node <<<< HERE
namespace: default
ownerReferences:
- apiVersion: cluster.x-k8s.io/v1alpha3
kind: Cluster
name: workload-slot59rp48
spec:
template:
spec:
network:
devices:
- dhcp4: true
networkName: Lab-env48a <<<< HERE
- Follow the same instructions to modify the worker template as shown in the previous step
- Now, apply that template with kubectl:
kubectl -n default apply -f control-plane-test-node.yaml
vspheremachinetemplate.infrastructure.cluster.x-k8s.io/control-plane-test-node created
kubectl -n default apply -f worker-test-node.yaml
vspheremachinetemplate.infrastructure.cluster.x-k8s.io/worker-test-node created
- To check the newly created
vspheremachinetemplates
:
kubectl get -n default vspheremachinetemplates
NAME AGE
control-plane-test-node 2m11s << NEW
worker-test-node 90s << NEW
workload-slot59rp48-control-plane 45h
workload-slot59rp48-worker 45h
- With those templates now available, we need to edit the "
kubeadmcontrolplanes
" which will create a new control plane VM and the "machinedeployment" which creates a worker node VM:
# Control Plane
kubectl -n default get kubeadmcontrolplane
NAME INITIALIZED API SERVER AVAILABLE VERSION REPLICAS READY UPDATED UNAVAILABLE
workload-slot59rp48-control-plane true true v1.21.2+vmware.1 1 1 1
kubectl -n default edit kubeadmcontrolplanes workload-slot59rp48-control-plane
<omitted output>
spec:
infrastructureTemplate:
apiVersion: infrastructure.cluster.x-k8s.io/v1alpha3
kind: VSphereMachineTemplate
name: control-plane-test-node <<<<< vspheremachinetemplates created for the Control Plane
namespace: default
kubectl -n default edit kubeadmcontrolplanes workload-slot59rp48-control-plane
kubeadmcontrolplane.controlplane.cluster.x-k8s.io/workload-slot59rp48-control-plane edited
#Worker (machinedeployment):
kubectl -n default get machinedeployments
NAME PHASE REPLICAS READY UPDATED UNAVAILABLE
workload-slot59rp48-md-0 Running 1 1 1
kubectl -n default edit machinedeployments workload-slot59rp48-md-0
infrastructureRef:
apiVersion: infrastructure.cluster.x-k8s.io/v1alpha3
kind: VSphereMachineTemplate
name: worker-test-node <<<<< vspheremachinetemplates created for the Worker
version: v1.20.5+vmware.1
kubectl -n default edit machinedeployments workload-slot59rp48-md-0
machinedeployment.cluster.x-k8s.io/workload-slot59rp48-md-0 edited
- With the above steps, we are able to see the new control-plane and worker machines will be stuck in "Provisioning" state because of the incorrect network name:
kubectl -n default get machines
NAME PROVIDERID PHASE VERSION
workload-slot59rp48-control-plane-g5q52 Provisioning v1.21.2+vmware.1
workload-slot59rp48-control-plane-kj5d6 vsphere://4237f106-5d51-0c92-1672-377819f0dec1 Running v1.21.2+vmware.1
workload-slot59rp48-md-0-5d66d4d96f-bnw4z Provisioning v1.21.2+vmware.1
workload-slot59rp48-md-0-6968c6cc98-55ckf vsphere://4237843d-9e7b-4689-cd54-de6a0d08cc98 Running v1.21.2+vmware.1
- To identify the errors reported, you can refer to the Symptoms section of this article and validate that you see similar errors about the network being not found.
Symptoms:
kubectl -n default describe machine workload-slot59rp48-control-plane-g5q52
Message: error getting network specs for "infrastructure.cluster.x-k8s.io/v1alpha3, Kind=VSphereVM default/workload-slot59rp48-control-plane-g5q52": unable to find network "Lab-env48a": network 'Lab-env48a' not found
kubectl -n default describe machine workload-slot59rp48-md-0-5d66d4d96f-bnw4z
Message: error getting network specs for "infrastructure.cluster.x-k8s.io/v1alpha3, Kind=VSphereVM default/workload-slot59rp48-md-0-5d66d4d96f-bnw4z": unable to find network "Lab-env48a": network 'Lab-env48a' not found
- As you can see, the control-plane and worker machines fail to find the correct network name in vSphere because the name does not match with the right network name present in vSphere.