"Cluster control plane is still being initialized" error when creating a Tanzu Kubernetes Grid workload cluster on Azure
search cancel

"Cluster control plane is still being initialized" error when creating a Tanzu Kubernetes Grid workload cluster on Azure

book

Article ID: 337409

calendar_today

Updated On:

Products

Tanzu Kubernetes Grid

Issue/Introduction

Symptoms:
The Tanzu Kubernetes Grid command line interface (CLI) fails to create a workload cluster with the following error message:

I0521 05:01:01.575145 poller.go:101] timed out waiting for cluster creation to complete: cluster control plane is still being initialized

The capz-controller-manager pod logs report the following errors:

E0521 07:09:27.054398 1 controller.go:248] controller-runtime/controller "msg"="Reconciler error" "error"="failed to reconcile cluster services: failed to reconcile load balancer: failed to create load balancer abc-ser-cls: network.LoadBalancersClient#CreateOrUpdate: Failure sending request: StatusCode=400 -- Original Error: Code=\"InvalidResourceReference\" Message=\"Resource /subscriptions/xxxxxxxx-3c-40f2-aee4-xxx/resourceGroups/my_resource_group/providers/Microsoft.Network/loadBalancers/abc-ser-cls/frontendIPConfigurations/xxxx referenced by resource /subscriptions/xxxxxxxxx-3cac-40f2-aee4-58615cc72a67/resourceGroups/my_resource_group/providers/Microsoft.Network/loadBalancers/abc-ser-cls/loadBalancingRules/xxxxxx-TCP-80 was not found.
Please make sure that the referenced resource exists, and that both resources are in the same region.\" Details=[]" "controller"="azurecluster" "name"="workload_cluster" "namespace"="default"


Environment

VMware Tanzu Kubernetes Grid 1.x

Cause

This error occurs when the management cluster is not able to communicate with the new cluster's master node. There can be multiple reasons for this failure. A few likely reasons are highlighted below:
  • Azure accelerated networking' is enabled by default when using machine type with 4 or more CPU cores. Due to a known issue, TKG is currently incompatible with Azure accelerated networking.
  • The management cluster and workload cluster are located in different locations with no connectivity between them.
  • Wrong DNS settings can prevent nodes in the cluster from communicating with each other.

Resolution

If you hit this error because TKG does not support Azure accelerated networking, it is recommended to upgrade to TKG 1.3.1.

For the other cases, please check the Workaround section.


Workaround:
If machine types with 4+ CPU cores are used for the cluster and TKG 1.3.0 is used, you must disable the Azure accelerated networking by modifying the azure-overlay.yaml as below:
$ vi ~/.tanzu/tkg/providers/infrastructure-azure/ytt/azure-overlay.yaml
#! Please add any overlays specific to Azure provider under this file.
#@ load("@ytt:overlay", "overlay")

#@overlay/match expects="1+", by=overlay.subset({"kind": "AzureMachineTemplate"})
---
spec:
  template:
    spec:
      #@overlay/match missing_ok=True
      acceleratedNetworking: false


Make sure the workload cluster and management cluster have connectivity. For more information this workaround, refer to the following Tanzu Kubernetes Grid 1.3 Release Notes.

Check the DNS settings of the management cluster and make it so you can resolve internal hostnames without an issue.