ESXi 6.7 host becomes unresponsive and is disconnected from the vCenter server.
search cancel

ESXi 6.7 host becomes unresponsive and is disconnected from the vCenter server.

book

Article ID: 344746

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

Symptoms:

  • The hostd service gets into a hung state causing the vCenter to mark it as disconnected.
  • syslog.log:
    yyyy-mm-ddThh:mm:ssZ lwsmd: [lsass] Failed to run provider specific request (request code = 14, provider = 'lsa-activedirectory-provider') -> error = 40121, symbol = LW_ERROR_DOMAIN_IS_OFFLINE, client pid = 2172342
    yyyy-mm-ddThh:mm:ssZ lwsmd: [lsass] Failed to run provider specific request (request code = 14, provider = 'lsa-activedirectory-provider') -> error = 40121, symbol = LW_ERROR_DOMAIN_IS_OFFLINE, client pid = 2172802
    yyyy-mm-ddThh:mm:ssZ lwsmd: [lsass] Failed to run provider specific request (request code = 14, provider = 'lsa-activedirectory-provider') -> error = 40121, symbol = LW_ERROR_DOMAIN_IS_OFFLINE, client pid = 2172882
    yyyy-mm-ddThh:mm:ssZ lwsmd: [lsass] Failed to run provider specific request (request code = 14, provider = 'lsa-activedirectory-provider') -> error = 40121, symbol = LW_ERROR_DOMAIN_IS_OFFLINE, client pid = 2175106

    yyyy-mm-ddThh:mm:ssZ lwsmd: [netlogon] CLDAP timed out: Domain01.local
    yyyy-mm-ddThh:mm:ssZ lwsmd: [netlogon] CLDAP timed out: Domain02.local
    yyyy-mm-ddThh:mm:ssZ lwsmd: [lsass] Could not transition domain 'Domain.local' to ONLINE state. Error 2453 2021-05-19T10:43:40Z lwsmd: [lsass] Found domain 'Domain.local' to be offline while resolving its objects.

Environment

  • VMware ESXi 6.7.x

Cause

This issue occurs when the host is unable to reach the domain controller, leading to exhaustion of the hostd memory.

Resolution

  • The following ports must be accessible as prerequisites: 88, 139, 389, and 445.  
  • VMware Engineering Team is working to improve the behavior of the likewise agent in vSphere 6.7 in such situation whereas this issue is mitigated already in vSphere 7.x
  • Ensure that the following ports (both UDP and TCP) are open for communication between the ESX/ESXi host and Active Directory:
    • Port 88 - Kerberos authentication
    • Port 123 – NTP
    • Port 135 - RPC
    • Port 137 - NetBIOS Name Service
    • Port 139 - NetBIOS Session Service (SMB)
    • Port 389 - LDAP
    • Port 445 - Microsoft-DS Active Directory, Windows shares (SMB over TCP)
    • Port 464 - Kerberos - change/password changes
    • Port 3268- Global Catalog search

Workaround:

  • SSH to the ESXi host.Using ESXi Shell in ESXi
  • Restart the hostd service to bring it back online.
    • /etc/init.d/hostd stop
    • /etc/init.d/hostd start
  • Please validate if we can reach the domain controller is reachable if not validate the Physical Network to see if any firewall is blocking the ports to resolve the issue permanently
    •  time nc -zv <DC_IP> 88
    •  time nc -zv <DC_IP> 389
    •  time nc -zv <DC_IP> 445

Additional Information

Impact/Risks:

  • There is no Impact on the Virtual machines that are powered ON and running on the ESXi host
  • The only impact is that the ESXi Host is unmanageable via the vCenter and Direct Host Client login