Unable to find network <network-name>: network <network-name> not found in a TKGm cluster

search cancel

Unable to find network <network-name>: network <network-name> not found in a TKGm cluster

book

Article ID: 331350

calendar_today

Updated On: 01-14-2022

Products

VMware Tanzu Kubernetes Grid

Issue/Introduction

Note - These steps were tested in an environment running on vSphere infrastructure. However, similar steps could be applied for other infrastructures like AWS or Azure.

The objective of this article is to explain how to change the infrastructure specs of a running Tanzu Kubernetes grid cluster (can be a management cluster or a workload cluster)

How to reproduce this issue?

First, we need to get the machines and vspheremachinetemplates:

kubectl -n default get machines

NAME                                        PROVIDERID                                       PHASE     VERSION
workload-slot59rp48-control-plane-kj5d6     vsphere://4237f106-5d51-0c92-1672-377819f0dec1   Running   v1.21.2+vmware.1
workload-slot59rp48-md-0-6968c6cc98-55ckf   vsphere://4237843d-9e7b-4689-cd54-de6a0d08cc98   Running   v1.21.2+vmware.1

kubectl -n default get vspheremachinetemplates

NAME                                AGE
workload-slot59rp48-control-plane   44h
workload-slot59rp48-worker          44h

Then you can make a copy of the "vspheremachinetemplates" to a YAML file as follow, for both the control plane and worker node:

Note - Making a copy of templates is required because existing vspheremachinetemplates are immutable and the kubectl edit command will produce an error if executed on existing templates.

kubectl -n default get vspheremachinetemplates workload-slot59rp48-control-plane -o yaml > control-plane-test-node.yaml

kubectl -n default get vspheremachinetemplates workload-slot59rp48-worker -oyaml > worker-test-node.yaml

Edit the YAML files generated in the previous steps to change the name of the templates as well as introduce an incorrect networkName to forcefully generate an error. For example (some details are omitted for brevity):

apiVersion: infrastructure.cluster.x-k8s.io/v1alpha3
kind: VSphereMachineTemplate
metadata:
  name: control-plane-test-node                 <<<< HERE
  namespace: default
  ownerReferences:
  - apiVersion: cluster.x-k8s.io/v1alpha3
    kind: Cluster
    name: workload-slot59rp48
spec:
  template:
    spec:
      network:
        devices:
        - dhcp4: true
          networkName: Lab-env48a               <<<< HERE

Follow the same instructions to modify the worker template as shown in the previous step
Now, apply that template with kubectl:

kubectl -n default apply -f control-plane-test-node.yaml
vspheremachinetemplate.infrastructure.cluster.x-k8s.io/control-plane-test-node created

kubectl -n default apply -f worker-test-node.yaml
vspheremachinetemplate.infrastructure.cluster.x-k8s.io/worker-test-node created

To check the newly created vspheremachinetemplates:

kubectl get -n default vspheremachinetemplates
NAME                                AGE
control-plane-test-node             2m11s  << NEW
worker-test-node                    90s    << NEW   
workload-slot59rp48-control-plane   45h
workload-slot59rp48-worker          45h

With those templates now available, we need to edit the "kubeadmcontrolplanes" which will create a new control plane VM and the "machinedeployment" which creates a worker node VM:

# Control Plane
kubectl -n default get kubeadmcontrolplane
NAME                                INITIALIZED   API SERVER AVAILABLE   VERSION            REPLICAS   READY   UPDATED   UNAVAILABLE
workload-slot59rp48-control-plane   true          true                   v1.21.2+vmware.1   1          1       1

kubectl -n default edit kubeadmcontrolplanes workload-slot59rp48-control-plane
<omitted output>
spec:
 infrastructureTemplate:
   apiVersion: infrastructure.cluster.x-k8s.io/v1alpha3
   kind: VSphereMachineTemplate
   name: control-plane-test-node       <<<<< vspheremachinetemplates created for the Control Plane
   namespace: default

kubectl -n default edit kubeadmcontrolplanes workload-slot59rp48-control-plane
kubeadmcontrolplane.controlplane.cluster.x-k8s.io/workload-slot59rp48-control-plane edited

#Worker (machinedeployment):

kubectl -n default get machinedeployments
NAME                       PHASE     REPLICAS   READY   UPDATED   UNAVAILABLE
workload-slot59rp48-md-0   Running   1          1       1

kubectl -n default edit machinedeployments workload-slot59rp48-md-0
     infrastructureRef:
       apiVersion: infrastructure.cluster.x-k8s.io/v1alpha3
       kind: VSphereMachineTemplate
       name: worker-test-node               <<<<< vspheremachinetemplates created for the Worker
     version: v1.20.5+vmware.1

kubectl -n default edit machinedeployments workload-slot59rp48-md-0
machinedeployment.cluster.x-k8s.io/workload-slot59rp48-md-0 edited

With the above steps, we are able to see the new control-plane and worker machines will be stuck in "Provisioning" state because of the incorrect network name:

kubectl -n default get machines
NAME                                        PROVIDERID                                       PHASE          VERSION
workload-slot59rp48-control-plane-g5q52                                                      Provisioning   v1.21.2+vmware.1
workload-slot59rp48-control-plane-kj5d6     vsphere://4237f106-5d51-0c92-1672-377819f0dec1   Running        v1.21.2+vmware.1
workload-slot59rp48-md-0-5d66d4d96f-bnw4z                                                    Provisioning   v1.21.2+vmware.1
workload-slot59rp48-md-0-6968c6cc98-55ckf   vsphere://4237843d-9e7b-4689-cd54-de6a0d08cc98   Running        v1.21.2+vmware.1

To identify the errors reported, you can refer to the Symptoms section of this article and validate that you see similar errors about the network being not found.

Symptoms:

Describing the machines stuck in the "Provisioning" state shows the following error:

kubectl -n default describe machine workload-slot59rp48-control-plane-g5q52

Message: error getting network specs for "infrastructure.cluster.x-k8s.io/v1alpha3, Kind=VSphereVM default/workload-slot59rp48-control-plane-g5q52": unable to find network "Lab-env48a": network 'Lab-env48a' not found

kubectl -n default describe machine workload-slot59rp48-md-0-5d66d4d96f-bnw4z

Message: error getting network specs for "infrastructure.cluster.x-k8s.io/v1alpha3, Kind=VSphereVM default/workload-slot59rp48-md-0-5d66d4d96f-bnw4z": unable to find network "Lab-env48a": network 'Lab-env48a' not found

As you can see, the control-plane and worker machines fail to find the correct network name in vSphere because the name does not match with the right network name present in vSphere.

Resolution

Generate the new vspheremachinetemplates with the right network configurations as per your infrastructure:

kubectl -n default get vspheremachinetemplates workload-slot59rp48-control-plane -o yaml > control-plane-new-template.yaml

kubectl -n default get vspheremachinetemplates workload-slot59rp48-worker -oyaml > worker-new-template.yaml

Edit the YAML files generated in the previous steps with a new template name and the right networkName as per your infrastructure. For example (some details are omitted for brevity):

apiVersion: infrastructure.cluster.x-k8s.io/v1alpha3
kind: VSphereMachineTemplate
metadata:
  name: control-plane-new-template                 <<<< HERE
  namespace: default
  ownerReferences:
  - apiVersion: cluster.x-k8s.io/v1alpha3
    kind: Cluster
    name: workload-slot59rp48
spec:
  template:
    spec:
      network:
        devices:
        - dhcp4: true
          networkName: Lab-env48               <<<< HERE

Follow the same instructions to modify the worker template as shown in the previous step

Now, apply that template with kubectl:

kubectl -n default apply -f control-plane-test-node.yaml
vspheremachinetemplate.infrastructure.cluster.x-k8s.io/control-plane-new-template created

kubectl -n default apply -f worker-test-node.yaml
vspheremachinetemplate.infrastructure.cluster.x-k8s.io/worker-new-template created

Check the vspheremachinetemplates:

kubectl -n default get vspheremachinetemplate
NAME                                AGE
control-plane-new-template          20s
worker-new-template                 17s
workload-slot59rp48-control-plane   45h
workload-slot59rp48-worker          45h

With those templates now available, we need to edit the "kubeadmcontrolplanes" and the "machinedeployment":

# Control Plane
kubectl -n default get kubeadmcontrolplane
NAME                                INITIALIZED   API SERVER AVAILABLE   VERSION            REPLICAS   READY   UPDATED   UNAVAILABLE
workload-slot59rp48-control-plane   true          true                   v1.21.2+vmware.1   1          1       1

kubectl -n default edit kubeadmcontrolplanes workload-slot59rp48-control-plane
<omitted output>
spec:
 infrastructureTemplate:
   apiVersion: infrastructure.cluster.x-k8s.io/v1alpha3
   kind: VSphereMachineTemplate
   name: control-plane-new-template       <<<<< vspheremachinetemplates created for the Control Plane
   namespace: default

kubectl -n default edit kubeadmcontrolplanes workload-slot59rp48-control-plane
kubeadmcontrolplane.controlplane.cluster.x-k8s.io/workload-slot59rp48-control-plane edited

#Worker (machinedeployment):

kubectl -n default get machinedeployments
NAME                       PHASE     REPLICAS   READY   UPDATED   UNAVAILABLE
workload-slot59rp48-md-0   Running   1          1       1

kubectl -n default edit machinedeployments workload-slot59rp48-md-0
     infrastructureRef:
       apiVersion: infrastructure.cluster.x-k8s.io/v1alpha3
       kind: VSphereMachineTemplate
       name: worker-new-template               <<<<< vspheremachinetemplates created for the Worker
     version: v1.20.5+vmware.1

kubectl -n default edit machinedeployments workload-slot59rp48-md-0
machinedeployment.cluster.x-k8s.io/workload-slot59rp48-md-0 edited

Once the changes have been applied, delete the machines currently stuck in "Provisioning" state and wait for the reconciler to create new machines. For example:

kubectl -n default delete machine workload-slot59rp48-control-plane-42c4v

kubectl -n default delete machine workload-slot59rp48-md-0-54b5b9d755-t6p65

Validate that the new machines are created in your infrastructure & check if they are in running status by using the following command:

kubectl -n default get machines

Feedback

Was this article helpful?

thumb_up Yes

thumb_down No