Multiple ESXi hosts disconnect from vCenter
search cancel

Multiple ESXi hosts disconnect from vCenter

book

Article ID: 388945

calendar_today

Updated On:

Products

VMware vCenter Server 7.0

Issue/Introduction

Symptoms:

  • ESXi hosts start disconnecting from vCenter
  • In the var/log/vmware/vpxd/vpxd.log on the vCenter you may see entries similar to:

YYYY-MM-DDTHH:MM:SS.102 warning vpxd[65345] [Originator@6876 sub=MoHost opID=HB-host-XXX@6937433-442d20c0] host [vim.HostSystem:host-XXX,ESXifqdn] connection state changed to NO_RESPONSE
YYYY-MM-DDTHH:MM:SS.964 warning vpxd[02314] [Originator@6876 sub=IO.Connection opID=256cba79] Failed to resolve address; <resolver p:0x00007f787405ce90, 'ESXifqdn:443', next:(null)>, e: 125(Operation canceled), async: true, duration: 1484897msec

  • Initiating a Nslookup or Ping test for any of the disconnected ESXi host's fully qualified domain name will be struck without any output
  • In the var/log/vmware/messages on the vCenter you may see entries similar to:

YYYY-MM-DDTHH:MM:SS.402129 vcenter systemd-resolved[16641]: Using DNS server XX.XX.XX.XX for transaction 22082.
YYYY-MM-DDTHH:MM:SS.402156 vcenter systemd-resolved[16641]: Sending query via TCP since UDP isn't supported.
YYYY-MM-DDTHH:MM:SS.402186 vcenter systemd-resolved[16641]: Using feature level TLS+EDNS0 for transaction 22082.

YYYY-MM-DDTHH:MM:SS.449823 vcenter systemd-resolved[16641]: Using DNS server XX.XX.XX.XX for transaction 11552.
YYYY-MM-DDTHH:MM:SS.449903 vcenter systemd-resolved[16641]: Transaction 11552 for <ESXifqdn IN A> on scope dns on */* now complete with <ETIMEDOUT> from none (unsigned).
YYYY-MM-DDTHH:MM:SS.449933 vcenter systemd-resolved[16641]: Sent message type=error sender=n/a destination=:1.19582272 path=n/a interface=n/a member=n/a cookie=71
28822 reply_cookie=2 signature=s error-name=org.freedesktop.DBus.Error.Timeout error-message=Lookup failed due to system error: Connection timed out

Note: The above log snippets related to systemd-resolved are only available when debug logging is enabled for systemd-resolved service. By default debug logging is not enabled for systemd-resolved service

Environment

VMware vCenter Server 7.x

Cause

  • This issue is caused when systemd-resolved service goes unresponsive on the vCenter.
  • By default vCenter uses UDP for DNS resolution requests. When the configured external DNS server does not support UDP, vCenter will fallback to TCP.
  • When DNS resolution requests are using TCP along with TLS, it is observed that systemd-resolved services goes to a unresponsive state intermittently
  • At this stage vCenter will not be able to resolve any ESXi hosts fully qualified domain names and all ESXi hosts will get disconnected from vCenter.

Resolution

To resolve the issue, restart the systemd-resolved service on the vCenter using the below procedure:

  • Take an ssh session to the vCenter server
  • Execute the below command

systemctl restart systemd-resolved

  • Wait for the ESXi hosts to connect back

Workaround :

To prevent this issue from re-occurring, you can use any of the below workarounds:

  1. Enable support for UDP on the external DNS server. 
  2. Disable TLS for DNS resolutions on the vCenter
    • Take an ssh session to the vCenter
    • Edit the resolved.conf file using below command

vi /etc/systemd/resolved.conf

    • Append the below line

DNSOverTLS=no

    • Save the file
    • Restart the systemd-resolved service

systemctl restart systemd-resolved