WCP Fails to configure due to Top Level Domain pointing to .local domain
search cancel

WCP Fails to configure due to Top Level Domain pointing to .local domain

book

Article ID: 319363

calendar_today

Updated On: 09-05-2024

Products

VMware vSphere ESXi VMware vSphere Kubernetes Service

Issue/Introduction

Symptoms:

  • Installing vSphere with Tanzu, the following error is presented: 
 
Configure operation for the Master node VM with identifier vm-xx failed.
 
  • The TOP LEVEL DOMAIN DNS is built with a .local domain. 
  • The SupervisorControlPlaneVM's are deployed with only 1 ethernet device in the Management Network (there should be 2 ethernet devices attached under normal circumstances).
  • From an SSH session to the SupervisorControlPlaneVM's, when gathering DNS lookup commands without the DNS server specified, nslookup will return failures. Nslookup will succeed when the DNS server is entered manually:
WITHOUT DNS:
 
root@42184a1e6d3c54eff2384b2736cf2079 [ ~ ]# nslookup vcenter-01a.vmware.local
Server: 127.0.0.53
Address: 127.0.0.53#53
 
** server can't find vcenter-01a.vmware.local: SERVFAIL
 
WITH DNS (DNS IP is 10.10.20.10, vCenter IP is 10.200.15.20):
 
root@42184a1e6d3c54eff2384b2736cf2079[ ~ ]# vcenter-01a.vmware.local 10.10.20.10
Server: 10.10.20.10
Address: 10.10.20.10#53
 
Name: vcenter-01a.vmware.local
Address: 10.200.15.20
  • From an SSH session to the SupervisorControlPlaneVM, the api-server and kube-scheduler containers report CLBO state. The following errors will be presented in logging:
API-SERVER LOG:
 
E1020 11:31:37.222979 1 oidc.go:224] oidc authenticator: initializing plugin: Get https://vcenter-01a.vmware.local/openidconnect/vsphere.local/.well-known/openid-configuration: dial tcp: lookup vcenter-01a.vmware.local on 127.0.0.53:53: server misbehaving
 
KUBE-SCHEDULER LOG:
 
2022-10-20T16:31:37.096453895Z stderr F 2022-10-20T16:31:37.096Z error schedext [opID=cfgMapUpdate-bd86] Failed to create a VC client due to: Failed to create new vmomi client. Err: Post https://vcenter-01a.vmware.local/sdk: dial tcp: lookup vcenter-01a.vmware.local: Temporary failure in name resolution, retry 8s later

 
If you see this issue when using TKG, see the following KB: https://kb.vmware.com/s/article/83623



Environment

VMware vSphere 7.0 with Tanzu

Cause

The failure to configure WCP clusters when DNS is configured with a Top Level Domain using .local is caused by an inability of the Supervisor Cluster VM's to search a .local domain.

This is expected behavior from the systemd-resolved service. See https://www.freedesktop.org/software/systemd/man/systemd-resolved.service.html for more detailed information on this topic.
 
  • Multi-label names with the domain suffix ".local" are resolved using MulticastDNS on all local interfaces where MulticastDNS is enabled. As with LLMNR, IPv4 address lookups are sent via IPv4 and IPv6 address lookups are sent via IPv6.

  • Queries for multi-label names are routed via unicast DNS on local interfaces that have a DNS server configured, plus the globally configured DNS servers if there are any. Which interfaces are used is determined by the routing logic based on search and route-only domains, described below. Note that by default, lookups for domains with the ".local" suffix are not routed to DNS servers, unless the domain is specified explicitly as routing or search domain for the DNS server and interface. This means that on networks where the ".local" domain is defined in a site-specific DNS server, explicit search or routing domains need to be configured to make lookups work within this DNS domain. Note that these days, it's generally recommended to avoid defining ".local" in a DNS server, as RFC6762 reserves this domain for exclusive MulticastDNS use.

Resolution

To resolve this, a search domain must be added during initial WCP configuration to reference the .local domain. This will allow the Supervisor Cluster VM's to search the .local DNS domain allowing them to identify the DNS server and subsequently reference vCenter server.

Example:

image.png
 

In this example, the DNS Top Level Domain is vmware.local. This will need to be modified to match the .local domain in use per environment. For example: If the server FQDN is server.test.local, the search domain should be test.local


Additional Information

Impact/Risks:
The .local hostname is reserved for use in mDNS per RFC6762  therefore trying to resolve it against a DNS server violates RFC6762. As such VMware does not recommend any deployment which uses .local for any components. (this includes vCenter, ESXi, NSX Advance Load Balancer, NSX manager, NSX Edge nodes, TKGs nodes or API endpoints, and any endpoint TKGs uses like harbor).

The workaround for this is strictly for proof of concept and lab use. Implementing this workaround in your production environment could result in unexpected scenarios.