Tanzu Kubernetes Grid management cluster creation fails with reason:'NatGatewaysReconciliationFailed', message:'3 of 8 completed'
search cancel

Tanzu Kubernetes Grid management cluster creation fails with reason:'NatGatewaysReconciliationFailed', message:'3 of 8 completed'

book

Article ID: 319316

calendar_today

Updated On: 06-05-2024

Products

Tanzu Kubernetes Grid

Issue/Introduction

Symptoms:
  • You are attempting to create a TKG management cluster in AWS and it fails while the control plane is being initialized
  • Your cluster creation attempt returns an error similar to the one below

[cluster control plane is still being initialized, cluster infrastructure is still being provisioned], retrying
cluster creation failed, reason:'NatGatewaysReconciliationFailed', message:'3 of 8 completed'

Error: unable to set up management cluster: unable to wait for cluster and get the cluster kubeconfig: error waiting for cluster to be provisioned (this may take a few minutes): cluster creation failed, reason:'NatGatewaysReconciliationFailed', message:'3 of 8 completed'


Environment

VMware Tanzu Kubernetes Grid 1.x

Cause

To find the root cause of this problem, you should look at the capa-controller-manager pod logs in the capa-system namespace:

kubectl logs deployment.apps/capa-controller-manager -n capa-system manager

If you see the below error in your logs, the error occurs because during cluster creation no more Elastic IPs can be allocated.

I1219 23:18:56.366708       1 awsmachine_controller.go:457] controllers/AWSMachine "msg"="Cluster infrastructure is not ready yet" "awsMachine"="tkg-oom-md-0-5dwlx" "cluster"="tkg-oom" "machine"="tkg-oom-md-0-b455f7d97-rz644" "namespace"="tkg-system"
E1219 23:18:56.379178       1 controller.go:257] controller-runtime/controller "msg"="Reconciler error" "error"="failed to reconcile network for AWSCluster tkg-system/tkg-oom: failed to create one or more IP addresses for NAT gateways: failed to allocate Elastic IP: AddressLimitExceeded: The maximum number of addresses has been reached.\n\tstatus code: 400, request id: 12b9b3af-0d9e-455c-93ab-9e2bde27ef1d" "controller"="awscluster" "name"="tkg-oom" "namespace"="tkg-system"
I1219 23:18:56.383463       1 awscluster_controller.go:160] controllers/AWSCluster "msg"="Reconciling AWSCluster" "awsCluster"="tkg-oom" "cluster"="tkg-oom" "namespace"="tkg-system"

Resolution

To resolve this problem you should make sure that enough Elastic IPs are available and you have not exceeded the limit. If there are Elastic IP addresses that you no longer need you can release them following this section in AWS documentation.