VCF Automation: Unable to provision kubernetes clusters/VMs on second environment which is provisioned with separate T0 gateway in VCFA
search cancel

VCF Automation: Unable to provision kubernetes clusters/VMs on second environment which is provisioned with separate T0 gateway in VCFA

book

Article ID: 420083

calendar_today

Updated On:

Products

VCF Automation

Issue/Introduction

  • There are 2 workload domains deployed through VCFA which share an NSX manager
  • Each domain has its own NSX edge cluster and separate network stack from T0 Gateway all the way down
  • The first environment has a project, VPC and namespace created, deploying kubernetes test clusters are working and the workflow completes and shows as ready.
  • The second environment has its own project, VPC, supervisor and namespace but deploying Kubernetes test clusters does not complete. It gives the following networking errors in VCFA:
    • unable to retrieve kube-proxy daemonset from the guest cluster: failed to get server groups: Get "https://<Supervisor_IP>:6443/api?timeout=10s" dial tcp <Supervisor_IP>:6443 connect: network is unreachable
    • failed to create ClusterRoleBinding: failed to get server groups: Get "https://<Supervisor_IP>:6443/api?timeout=10s" dial tcp <Supervisor_IP>:6443 connect: network is unreachable
    • unable to reconcile kubeadm ConfigMap's CoreDNS info: unable to retrieve kubeadm Configmap from the guest cluster: failed to get server groups: Get "https://<Supervisor_IP>:6443/api?timeout=10s" dial tcp <Supervisor_IP>:6443 connect: network is unreachable
    • These come under Warning - ProvisioningFailed or RoleBindingSyncFailed

 

The issue can be confirmed by running a Packet Flow in NSX > Plan & Troubleshoot, between the k8s clusters/VMs and their supervisor. 

Environment

  • VMware Cloud Foundation Automation 9.x
  • vSphere 9.x
  • VMware NSX

Cause

The second supervisor has inherited the T0 gateway from the Default Project in NSX which is not in the same environment and so there is no network access from K8s clusters/VMs on this environment to its supervisor.

This happens when the supervisor is created before the relevant NSX project which would house the network for this environment.

Resolution

 

The issue can be solved in the following ways:

  • Method A: Shared T0 Gateway 
    If you can share one T0 for both WLDs, that would be much simpler. For this you would just need to recreate the idras tenant using the WLD01 Provider Gateway, like we did in our call when we created a test tenant. The name of the T0 GW can be changed to reflect that it's not specific to W01.
  • Method B: Create NSX Project before Supervisor
    If you are sure that you want to put the two WLDs under separate T0 gateways, this is our suggestion to ensure the correct T0 gateway is used in the W02 supervisor:
    1. Create a new NSX project using the WLD02 T0
    2. Delete the WLD02 Supervisor and re-create it using the newly pre-created NSX Project
    3. Create idras tenant and assign this tenant to WLD02 Supervisor
  • Method C: Use separate NSX manager for each WLD
    This would create a new Region in VCFA, as each region corresponds to an NSX manager.