vSphere Kubernetes Supervisor Cluster Error - Unable to connect to the management DNS servers from control plane VM - The connection was attempted over the workload network
search cancel

vSphere Kubernetes Supervisor Cluster Error - Unable to connect to the management DNS servers from control plane VM - The connection was attempted over the workload network

book

Article ID: 392239

calendar_today

Updated On:

Products

VMware vSphere Kubernetes Service VMware vSphere 7.0 with Tanzu vSphere with Tanzu Tanzu Kubernetes Runtime

Issue/Introduction

In the vSphere web client under Workload Management for Supervisors, the vSphere Kubernetes Supervisor cluster shows one or more Errors.

When clicking on the Error number count in parentheses (), the following error message is present, where the DNS server and control plane VM name will vary by environment:

  • Unable to connect to the management DNS servers <server name> from control plane VM <VM name> - The connection was attempted over the workload network

 

While SSH into one of the Supervisor Cluster control plane vms, the DNS server or vCenter IP address cannot be reached:

 

From the Supervisor cluster context, the one or more of following symptoms may be present:

  • Newly created VMs are stuck in poweredOff state:
    • kubectl get vm -n <namespace>

  • Describing the poweredOff newly created VM shows that a volume mount is not attached to the VM:
    • kubectl describe vm -n <namespace> <vm name>

  • For the above volume mount, the corresponding persistent volume claim (pvc) is in Pending state:
    • kubectl get pvc -A | grep <volume mount>

  • Performing a describe of the Pending persistent volume claim (pvc) shows an error message similar to the below over port 53:
    • kubectl describe pvc <pvc name> -n <namespace>
    • Post "https://<vcenter domain>:443/sdk": dial tcp: lookup <localhost or IP>:53: i/o timeout

Environment

vSphere with Tanzu 7.0

vSphere with Tanzu 8.0

This issue can occur regardless of whether or not the Supervisor cluster is managed by Tanzu Mission Control (TMC)

Cause

One or more Supervisor cluster control plane virtual machines are unable to reach the DNS servers configured in the environment.

This could be due to a networking configuration issue, a change to the DNS server configuration or an issue with the DNS service on the affected Supervisor cluster control plane vm(s).

The provided worker_dns values wholly contain the provided management DNS values, meaning that traffic is routed through the workload network.

Resolution

The connection between the affected Supervisor control plane VM(s) and the DNS server(s) will need to be fixed.

  1. Check that there are no issues as per the below documentation:
  2. Connect into the affected Supervisor control plane VM
  3. Ensure the pods used by the load balancer for the Supervisor cluster are in a Running status.
    1. The names of the load balancer pods will vary depending on the load balancer in use.  ex: "antrea-agent" or "nsx-ncp"
  4. Confirm that the expected DNS server is configured properly:
    • resolvectl status

      eth0 DNS server: <dns server IP address>

    • cat /etc/resolv.conf

      nameserver <server>
      search <FQDN>
      • Note: The /etc/resolv.conf file should not be manually edited.

  5. If the above nameserver points to 127.0.0.53:
    • Check that the systemd-networkd files are configured properly:
      • ls /etc/systemd/network/

    • Check the logs for systemd-networkd for any errors:
      • journalctl -xeu systemd-networkd

    • systemd-networkd can be restarted, if necessary:
      • systemctl restart systemd-networkd

  6. If the DNS server configuration needs to be updated, see the following documentation: