When using vCenter Server to manage ESXi hosts, you may encounter a situation where hosts intermittently enter a "not connected" state.
This can disrupt the management of your virtual infrastructure. One common cause of this issue is DNS resolution failures between the vCenter Server and its configured DNS servers.
The primary cause of ESXi hosts intermittently disconnecting from vCenter Server is often due to unstable DNS resolution. When the vCenter Server cannot consistently resolve the hostnames of the ESXi hosts, it may mark them as "not connected."
This can happen when there are issues with the network connectivity between the vCenter Server and its configured DNS servers or when the DNS servers are not responding to queries in a timely manner.
To resolve ESXi host disconnects caused by DNS resolution failures, follow these steps:
https://<vcenter_IP_or_FQDN>:5480
If DNS connectivity has been restored, restart the vCenter Server services to re-establish connections to the ESXi hosts. In some cases VC restart has also helped in resolving the issue due to DNS cache.
# journalctl -b -f | grep -i "Temporary failure in name resolution"
<date> dnsmasq[2084]: forwarded esxi_fqdn to DNS_SERVER_IP
<date> dnsmasq[2084]: reply esxi_fqdn is NXDOMAIN
Note: The journalctl
command leverages the journalctl utility to display log messages indicating DNS lookup failures as they happen. The "-b" flag shows log entries from the current boot session, while "-f" enables real-time monitoring of new log messages. The output is then filtered using grep to display only lines containing the phrase "Temporary failure in name resolution.
By running this command during the time when issues typically occur, you can quickly spot DNS-related errors and note the specific services or components affected.
By capturing network traffic on both the vCenter Server and an affected ESXi host, you can analyze the DNS query and response packets to determine if requests are reaching the DNS servers and if the responses are being returned successfully.