"failed to create load balancer for the cluster" and "gateway reference should not be nil" errors when deploying TKG clusters through Cloud Director Container Service Extension
search cancel

"failed to create load balancer for the cluster" and "gateway reference should not be nil" errors when deploying TKG clusters through Cloud Director Container Service Extension

book

Article ID: 320416

calendar_today

Updated On:

Products

VMware Cloud Director

Issue/Introduction

Symptoms:
  • Attempting to create a Tanzu Kubernetes Grid (TKG) cluster fails in Cloud Director through Container Service Extension.
  • The vApp and initial Ephemeral VM for the TKG cluster are created but the first Control Plane node or Worker node VM is never created.
  • The Load Balancer for the TKG cluster is not created.
  • The network chosen during the TKG cluster creation is not of the Routed Organization VDC Network type and is of another type such as Direct or Isolated.
  • The TKG cluster's Events tab in Kubernetes Container Clusters plugin shows LoadBalancerError events with an error of the form:
failed to create load balancer for the cluster[<TKG_CLUSTER_NAME>(<TKG_CLUSTER_ID>)]:[gateway reference should not be nil]
  • The Cluster API Provider for VMware Cloud Director (CAPVCD) logs on the Ephemeral VM show an error creating the Load Balancer for the TKG cluster of the form:
ERROR   Reconciler error        {"controller": "vcdcluster", "controllerGroup": "infrastructure.cluster.x-k8s.io", "controllerKind": "VCDCluster", "VCDCluster": {"name":"<TKG_CLUSTER_NAME>","namespace":"<TKG_CLUSTER_NAME>-ns"}, "namespace": "<TKG_CLUSTER_NAME>-ns", "name": "<TKG_CLUSTER_NAME>", "reconcileID": "<RECONCILE_ID>", "error": "failed to create load balancer for the cluster [<TKG_CLUSTER_NAME>(<TKG_CLUSTER_ID>))]: [gateway reference should not be nil]: gateway reference should not be nil", "errorVerbose": "gateway reference should not be nil\nfailed to create load balancer for the cluster [<TKG_CLUSTER_NAME>(<TKG_CLUSTER_ID>)]: [gateway reference should not be nil]


Environment

VMware Cloud Director 10.x

Cause

This issue occurs if the Organization VDC Network in Cloud Director chosen for the TKG cluster deployment is not of the Routed type connected to an Edge Gateway backed by NSX.

The Edge Gateway must also have NSX Load Balancer enabled and configured in order for the TKG cluster's Virtual Service to be created.

Resolution

Ensure that the TKG cluster is being deployed to a Routed Organization VDC Network that meets the prerequisites from the Container Service Extension documentation, Organization Virtual Data Center Prerequisites for Kubernetes Cluster Deployment.

 


Additional Information

Add a Routed Organization Virtual Data Center Network in the VMware Cloud Director Tenant Portal
Working with NSX Advanced Load Balancing in the VMware Cloud Director Tenant Portal