High Latency Between ESXi Hosts and Active Directory Causes vMotions to Fail
search cancel

High Latency Between ESXi Hosts and Active Directory Causes vMotions to Fail

book

Article ID: 316405

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

Symptoms:
  • vMotion's fail or get stuck at 19% seemingly randomly 
  • One or all hosts are connected to AD for user authentication
  • There is high latency between the host and one or more AD or DNS server
  • In the /var/run/log/syslog.log file you see entries similar to this:
    • 2021-03-05T22:16:23Z lwsmd: [lsass] Error code 40286 occurred during attempt 0 of a ldap search. Retrying.
      2021-03-05T22:16:33Z lwsmd: [lsass] Clearing ldap DC connection list for domain 'domain.local' due to a network error.
      2021-03-05T22:21:23Z lwsmd: [lsass] Performing backup
      2021-03-06T02:18:05Z lwsmd: [lsass] Clearing ldap DC connection list for domain 'domain.local' due to a network error.
      2021-03-06T02:18:05Z lwsmd: [lsass] Delayed backup scheduled
      2021-03-06T02:23:05Z lwsmd: [lsass] Performing backup
      2021-03-06T05:58:00Z lwsmd: ldap_sasl_interactive_bind_s failed with error code -1
      2021-03-06T05:58:05Z lwsmd: [netlogon] Filtering list of 16 servers with list of 0 black listed servers
      2021-03-06T06:19:26Z lwsmd: [lsass] Clearing ldap DC connection list for domain 'domain.local' due to a network error.
  • In /var/run/log/vpxa.log you see the vMotion task fail after exactly 30 seconds:
    • 2021-03-10T03:12:28.525Z info vpxa[2100782] [Originator@6876 sub=vpxLro opID=khxovjgi-22095445-auto-d5kyl-h5:70702039-f4-01-c5] [VpxLRO] -- BEGIN task-1015 -- vmotionManager -- vim.host.VMotionManager.prepareDestinationEx -- 5244b343-e8d9-d5d6-c50d-4fdbc6e4a515
      
      2021-03-10T03:14:58.514Z info vpxa[2100782] [Originator@6876 sub=vpxLro opID=khxovjgi-22095445-auto-d5kyl-h5:70702039-f4-01-c5] [VpxLRO] -- FINISH task-1015
      

       


Environment

VMware ESXi 6.7.x

Cause

This is caused by a delay in getting a response from lwsmd for user authentication for 30 seconds or longer. This causes hostd to time out waiting for a response from the lwsmd service.

Resolution

This issue is resolved in ESXi 7.0 by removing the dependency on a response from the lwsmd service.

Workaround:
If you cannot upgrade, this can be worked around by disabling AD authentication on all affected hosts using the following procedures: 
  1. Leave the domain from the web client
    1. Browse to a host in the vSphere Client inventory.
    2. Click Configure.
    3. Under System, select Authentication Services.
    4. Click Leave Domain
  2. Leave the host via CLI:
    1. Log in to the host via SSH
    2. Run the following command:
      • /opt/likewise/bin/domainjoin-cli leave