ESXi host disconnects and reconnects in the vCenter due to delay in DNS name resolution.
search cancel

ESXi host disconnects and reconnects in the vCenter due to delay in DNS name resolution.

book

Article ID: 391046

calendar_today

Updated On:

Products

VMware vCenter Server

Issue/Introduction

  • ESXi host disconnects and reconnects in the vCenter due to DNS resolution taking grater than 120 seconds.
  • Some vCenter services have crashed i.e vpxd, vmware-vpostgres, eam, pschealth, vapi-endpoint
  • vCenter UI page shows "HTTPS Status 500- Internal Server Error for VC"
  • You would see below mentioned DNS related error messages in the /var/log/vmware/vpxd/vpxd.log
    warning vpxd[07944] [Originator@6876 sub=IO.Connection opID=7dbbeebf] Address resolution took too long; <resolver p:0x00007f338c4e3de0, 'esxi.example.com:443', next:(null)>, async: true, duration: 130242msec
    warning vpxd[07776] [Originator@6876 sub=IO.Connection opID=#####542] Failed to resolve address; <resolver p:0x00007f35c814c780, 'esxi.example.com:443', next:(null)>, e: 125(Operation canceled), async: true, duration: 125184msec
    warning vpxd[07776] [Originator@6876 sub=HttpConnectionPool-000000 opID=#####542] Failed to get pooled connection; <cs p:00007f3318004990, TCP:esxi.example.com:443>, (null), duration: 125185msec, N7Vmacore17CanceledExceptionE(Operation was canceled)

Environment

VMware vCenter 7.X

Cause

The root cause of the problem is DNS resolution taking longer than expected 120 seconds resulting in the host disconnection.

Resolution

This issue is addressed in VMware vCenter 8.0.X

To workaround the issue in VMware vCenter 7.x , follow the below steps:

  1. Take snapshot of vCenter VM (Powered off snapshot of all VCs if in ELM)
  2. SSH to the vCenter Server
  3. Edit dnsmasq.conf file
  4. vi /etc/dnsmasq.conf
  5. Comment out "no-negcache" and increase cache-size

    Before

    root@vcsa01 [ /etc/sysconfig ]# cat /etc/dnsmasq.conf
    listen-address=127.0.0.1
    bind-interfaces
    user=dnsmasq
    group=dnsmasq
    
    no-hosts
    log-queries
    log-facility=/var/log/vmware/dnsmasq.log
    domain-needed
    dns-forward-max=150
    cache-size=8192
    neg-ttl=3600



    After

    listen-address=127.0.0.1
    bind-interfaces
    user=dnsmasq
    group=dnsmasq
    
    #no-negcache
    no-hosts
    log-queries=extra
    log-facility=/var/log/vmware/dnsmasq.log
    domain-needed
    dns-forward-max=300
    cache-size=16384
    neg-ttl=86400
  6. Save the file
    esc > :wq!
  7. Restart the 'dnsmasq' service after updating the dnsmasq.conf
    systemctl restart dnsmasq