Kubernetes cluster creation fails in Cloud Director Container Service Extension when DNS over UDP is blocked
search cancel

Kubernetes cluster creation fails in Cloud Director Container Service Extension when DNS over UDP is blocked

book

Article ID: 325524

calendar_today

Updated On:

Products

VMware Cloud Director

Issue/Introduction

Symptoms:
  • Kubernetes cluster creation fails in Cloud Director Container Service Extension 4.0 and 4.1.
  • In the Kubernetes Container Clusters plugin in the Cloud Director portal the cluster has an Error status and shows Events similar to the following:
[error while bootstrapping the machine [<CLUSTER_NAME>/EPHEMERAL-TEMP-VM]; timeout for post customization phase [guestinfo.cloudinit.target.cluster.ready.status]] during cluster creation
  • The logs from the Cluster API Provider for Cloud Director (CAPVCD) pods show an error of the form:
cloud.go:82] Error initializing client from secrets: [unable to get swagger client from secrets: [unable to get bearer token from secrets: [failed to set authorization header: [Post "https://<VCD_URL>/oauth/tenant/<ORG_NAME>/token": dial tcp: lookup <VCD_URL> on <DNS_IP>:53: read udp <IP>:<PORT>-><DNS_IP>:53: i/o timeout]]]]
  • DNS over UDP on port 53 is blocked from the Kubernetes cluster VMs to the configured DNS Server for the routed Organization VDC Network used during cluster deployment.
  • Attempting to resolve the Cloud Director public URL or a container registry from an Ephemeral Temp VM, Control Plane Node, or Worker Node deployed during cluster creation fails, for example:
dig @<DNS_IP> <VCD_URL>

; <<>> DiG 9.16.1-Ubuntu <<>> @<DNS_IP> <VCD_URL>
; (1 server found)
;; global options: +cmd
;; connection timed out; no servers could be reached


Environment

VMware Cloud Director 10.x

Cause

This issue occurs when the Kubernetes cluster VMs and the deployed pods cannot resolve addresses using the configured DNS server then errors will occur and cluster creation will fail.
The Kubernetes cluster VMs and the deployed pods inherit their DNS configuration from the routed Organization VDC network chosen during cluster creation.

Resolution

Ensure that the routed Organization VDC network chosen during cluster creation has a valid DNS server configured.
For more information see the Cloud Director documentation Edit the DNS Settings of an Organization Virtual Data Center Network in the VMware Cloud Director Tenant Portal.

Once a valid DNS server is configured ensure that the appropriate NAT and Firewall rules are in place to allow VMs deployed on this routed Org VDC Network to resolve names using the DNS server for both DNS over TCP and DNS over UDP.
For more information see the Cloud Director documentation Managing NSX Edge Gateways in VMware Cloud Director Tenant Portal.

To verify the DNS server currently configured on an Ephemeral Temp VM, Control Plane Node, or Worker Node deployed during cluster creation, commands such as the following can be used:

resolvectl status ens192

For example if the routed Organization VDC network is configured with a DNS Server 192.168.1.2 and a domain example.com we would expect the following output:

resolvectl status ens192

Link 2 (ens192)
      Current Scopes: DNS
DefaultRoute setting: yes
       LLMNR setting: yes
MulticastDNS setting: no
  DNSOverTLS setting: no
      DNSSEC setting: no
    DNSSEC supported: no
  Current DNS Server: 192.168.1.2
         DNS Servers: 192.168.1.2
          DNS Domain: example.com

To test DNS on an Ephemeral Temp VM, Control Plane Node, or Worker Node deployed during cluster creation, commands such as the following can be used:

dig @<DNS_IP> <VCD_URL>
dig @<DNS_IP> <CONTAINER_REGISTRY_URL>

For example:

dig @192.168.1.2 vcloud.example.com

; <<>> DiG 9.16.1-Ubuntu <<>> @192.168.1.2 vcloud.example.com
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 29800
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4000
;; QUESTION SECTION:
;vcloud.example.com.  IN      A

;; ANSWER SECTION:
vcloud.example.com. 3600 IN   A       192.168.1.31

;; Query time: 0 msec
;; SERVER: 192.168.1.2#53(192.168.1.2)
;; WHEN: Mon Nov 20 17:00:03 UTC 2023
;; MSG SIZE  rcvd: 73


Additional Information

For more information on the network requirements see the VMware Cloud Director Container Service Extension Documentation, Organization Virtual Data Center Prerequisites for Kubernetes Cluster Deployment.

To generate logs from the Kubernetes External Cloud Provider for VMware Cloud Director (CCM) pods the generate-k8s-log-bundle.sh script can be used as per the documentation on VMware Cloud Director Container Service Extension, Troubleshooting.