High Latency Between ESXi Hosts and Active Directory Causes vMotions to Fail
book
Article ID: 316405
calendar_today
Updated On:
Products
VMware vSphere ESXi
Issue/Introduction
Symptoms:
vMotion's fail or get stuck at 19% seemingly randomly
One or all hosts are connected to AD for user authentication
There is high latency between the host and one or more AD or DNS server
In the /var/run/log/syslog.log file you see entries similar to this:
2021-03-05T22:16:23Z lwsmd: [lsass] Error code 40286 occurred during attempt 0 of a ldap search. Retrying.
2021-03-05T22:16:33Z lwsmd: [lsass] Clearing ldap DC connection list for domain 'domain.local' due to a network error.
2021-03-05T22:21:23Z lwsmd: [lsass] Performing backup
2021-03-06T02:18:05Z lwsmd: [lsass] Clearing ldap DC connection list for domain 'domain.local' due to a network error.
2021-03-06T02:18:05Z lwsmd: [lsass] Delayed backup scheduled
2021-03-06T02:23:05Z lwsmd: [lsass] Performing backup
2021-03-06T05:58:00Z lwsmd: ldap_sasl_interactive_bind_s failed with error code -1
2021-03-06T05:58:05Z lwsmd: [netlogon] Filtering list of 16 servers with list of 0 black listed servers
2021-03-06T06:19:26Z lwsmd: [lsass] Clearing ldap DC connection list for domain 'domain.local' due to a network error.
In /var/run/log/vpxa.log you see the vMotion task fail after exactly 30 seconds:
2021-03-10T03:12:28.525Z info vpxa[2100782] [Originator@6876 sub=vpxLro opID=khxovjgi-22095445-auto-d5kyl-h5:70702039-f4-01-c5] [VpxLRO] -- BEGIN task-1015 -- vmotionManager -- vim.host.VMotionManager.prepareDestinationEx -- 5244b343-e8d9-d5d6-c50d-4fdbc6e4a515
2021-03-10T03:14:58.514Z info vpxa[2100782] [Originator@6876 sub=vpxLro opID=khxovjgi-22095445-auto-d5kyl-h5:70702039-f4-01-c5] [VpxLRO] -- FINISH task-1015
Environment
VMware ESXi 6.7.x
Cause
This is caused by a delay in getting a response from lwsmd for user authentication for 30 seconds or longer. This causes hostd to time out waiting for a response from the lwsmd service.
Resolution
This issue is resolved in ESXi 7.0 by removing the dependency on a response from the lwsmd service.
Workaround: If you cannot upgrade, this can be worked around by disabling AD authentication on all affected hosts using the following procedures: