Resolving ESXi Host Disconnects Caused by DNS Resolution Failures in vCenter Server
book
Article ID: 369229
calendar_today
Updated On:
Products
VMware vSphere ESXi
Issue/Introduction
When using vCenter Server to manage ESXi hosts, you may encounter a situation where hosts intermittently enter a "not connected" state.
This can disrupt the management of your virtual infrastructure. One common cause of this issue is DNS resolution failures between the vCenter Server and its configured DNS servers.
Environment
vCenter Server 7.0.x
vCenter Server 8.0.x
ESXi 7.0.x
ESXi 8.0.x
Cause
The primary cause of ESXi hosts intermittently disconnecting from vCenter Server is often due to unstable DNS resolution. When the vCenter Server cannot consistently resolve the hostnames of the ESXi hosts, it may mark them as "not connected."
This can happen when there are issues with the network connectivity between the vCenter Server and its configured DNS servers or when the DNS servers are not responding to queries in a timely manner.
Resolution
To resolve ESXi host disconnects caused by DNS resolution failures, follow these steps:
1. Verify the DNS server settings on the vCenter Server:
Open the VAMI user interface of the vCenter Server by going to https://<vcenter_IP_or_FQDN>:5480
Login with the default administrator account administrator@<name_of_the_SSO_domain> (for example [email protected])
Open the "Networking" tab and ensure the correct DNS server IP addresses are entered:
2. Check the network connectivity between the vCenter Server and DNS servers:
Use the ping command from the vCenter Server to test reachability to the DNS server IP addresses.
If pings fail, investigate potential network issues, misconfigurations, or firewalls blocking required ports.
3. Validate DNS server health and performance:
Contact your DNS server administrators to assess the status of the DNS infrastructure.
Request a review of DNS server logs for any errors or excessive query response times.
Consider adding additional DNS servers to the vCenter Server configuration for redundancy.
4. Restart the vCenter Server services:
If DNS connectivity has been restored, restart the vCenter Server services to re-establish connections to the ESXi hosts.
5. Monitor vCenter Server logs for DNS-related errors:
Check the vCenter Server logs for any persistent DNS resolution failures.
For example, to help identify when and where DNS resolution issues are occurring on the vCenter Server in real-time, you can use the following command from the vCenter Server console:
# journalctl -b -f | grep -i "Temporary failure in name resolution"
Check the var/log/vmware/dnsmasq.log file. You may see lines similar to:
<date> dnsmasq[2084]: forwarded esxi_fqdn to DNS_SERVER_IP
<date> dnsmasq[2084]: reply esxi_fqdn is NXDOMAIN
Note: The journalctl command leverages the journalctl utility to display log messages indicating DNS lookup failures as they happen. The "-b" flag shows log entries from the current boot session, while "-f" enables real-time monitoring of new log messages. The output is then filtered using grep to display only lines containing the phrase "Temporary failure in name resolution.
By running this command during the time when issues typically occur, you can quickly spot DNS-related errors and note the specific services or components affected.
6. If needed perform packet captures for diagnosing DNS resolution issues between the vCenter Server and ESXi hosts.
By capturing network traffic on both the vCenter Server and an affected ESXi host, you can analyze the DNS query and response packets to determine if requests are reaching the DNS servers and if the responses are being returned successfully.
When setting up packet captures, filter for UDP port 53 traffic between the relevant devices and the DNS servers.
Let the captures run during a period when the DNS resolution issues are actively occurring, then review the collected data using a packet analysis tool such as Wireshark.
Look for signs of unanswered DNS queries, retransmissions, or error responses that could indicate problems with network connectivity or DNS server configuration.
If errors continue, work with your network and DNS teams to further troubleshoot the issue.