ESXi hosts disjoins intermittently from Active Directory domain & the ESXi host shows not responding in the vCenter server
search cancel

ESXi hosts disjoins intermittently from Active Directory domain & the ESXi host shows not responding in the vCenter server

book

Article ID: 318664

calendar_today

Updated On:

Products

VMware vSphere ESXi 6.0 VMware vSphere ESXi 7.0 VMware vSphere ESXi 8.0 VMware vSphere ESXi

Issue/Introduction

  • ESXi hosts randomly disjoins from the Active Directory domain or sometimes disconnect from the vCenter Server.
  • ESXi may show not responding in vCenter
  • Navigating to the host UI may be slow
  • Restart of management agents on the ESXi host temporarily fixes the issue
  • On the vCenter or Aria Operations, there is an alert when ESXi host creates lwsmd-zdump file
  • ESXi - /var/log/vmkernel.log

yyyy-mm-dd  cpu5:2924874)MemSched: 14642: uw.2924860 (14074759) extraMin/extraFromParent: 256/256, likewise (790) childEmin/eMinLimit: 99871/100096
yyyy-mm-dd cpu5:2924874)MemSched: 14635: Admission failure in path: likewise/lwsmd.2924860/uw.2924860
yyyy-mm-dd cpu5:2924874)MemSched: 14642: uw.2924860 (14074759) extraMin/extraFromParent: 256/256, likewise (790) childEmin/eMinLimit: 99871/100096
yyyy-mm-dd cpu5:2924874)MemSched: 14635: Admission failure in path: likewise/lwsmd.2924860/uw.2924860

  • ESXi - /var/log/vmkernel.log and /var/log/vmkwarning.log

yyyy-mm-dd cpu11:2100089)ALERT: hostd detected to be non-responsive

  • If the likewise service runs out of memory, the following messages within ESXi - /var/log/vmkernel.log will be seen:

YYYY-MM-DDTHH:MM:SS.938Z Wa(180) vmkwarning: cpu52:2135899)WARNING: MemSchedAdmit: 1263: Group likewise: Requested memory limit 0 KB insufficient to support effective reservation 25192 KB
YYYY-MM-DDTHH:MM:SS.395Z In(182) vmkernel: cpu6:2135943)User: 3259: lwsmd: wantCoreDump:lwsmd signal:11 exitCode:0 coredump:enabled
YYYY-MM-DDTHH:MM:SS.526Z In(182) vmkernel: cpu6:2135943)UserDump: 3157: lwsmd: Dumping cartel 2135924 (from world 2135943) to file /var/core/lwsmd-zdump.000 ...

  • ESXi - /var/log/vobd.log file

YYYY-MM-DDTHH:MM:SS.810Z In(14) vobd[2097766]:  [UserWorldCorrelator] 10473605981us: [esx.problem.application.core.dumped] An application (/usr/lib/vmware/likewise/sbin/lwsmd) running on ESXi host has crashed (1 time(s) so far). A core file may have been created at /var/core/lwsmd-zdump.000

  • ESXi - /var/log/syslog.log

YYYY-MM-DDTHH:MM:SS.216Z Er(27) lwsmd[2100830]: [lwsm] Could not start bootstrap service: LW_ERROR_SERVICE_UNRESPONSIVE

Environment

ESXi 7.x, 8.x

Cause

The issue occurs due to the exhaustion of Likewise memory because of existing memory leaks in Active Directory operations and related libraries. When smart card authentication is enabled and configured on the ESXi hosts, Likewise memory leaks are also observed.

Resolution

This issue is resolved in VMware ESXi 8.0 Update 3e.

The fix adds a system service that monitors Likewise memory usage and restarts relevant services in case memory consumption approaches Likewise limits.

Reference: VMware ESXi 8.0 Update 3e Release Notes

Workaround

Option 1: Non-persistent workaround

The below script monitors the Likewise memory usage and generates a lwis-oom-stats file in the /tmp directory with Likewise memory stats. When it detects that Likewise runs out of memory, it will automatically remediate the issue. 

Note: This workaround is only temporary and will not persist if the ESXi host is rebooted.

  1. Download the lwis-mem-check-2.zip file attached to this Knowledge Base article.
  2. Unzip lwis-mem-check-2.zip and copy the lwis-mem-check-2.sh script in /tmp directory.
  3. Verify if the script is executable. If the script is not executable, run:

    chmod +x /tmp/lwis-mem-check-2.sh

  4. Run the script

    setsid /tmp/lwis-mem-check-2.sh >/dev/null 2>&1 < /dev/null &


Option 2: Persistent workaround

Important: Ensure to back up the ESXi host configuration before proceeding. Refer to KB How to back up and restore the ESXi host configuration.
Note: This is not an option if you have secureboot enabled.

If unfamiliar with the below steps, contact VMware support to assist.

Note: You will need a shared datastore path where you can place the lwis-mem-check.sh script that can be retrieved during startup.

  1. SSH to the ESXi host
  2. Backup the file

    cp /etc/rc.local.d/local.sh /local.sh

  3. Edit the file:

    vi /etc/rc.local.d/local.sh

  4. Add the following lines before “exit 0”

    cp /vmfs/volumes/<Datastore>/<script>/lwis-mem-check-2.sh /tmp/
    chmod +x /tmp/lwis-mem-check-2.sh
    setsid /tmp/lwis-mem-check-2.sh >/dev/null 2>&1 < /dev/null &

    Note: <Datastore>/<script> should be replaced with the correct datastore directory path where the script is located.

  5. Save the above changes
  6. To ensure the local.sh file will save during a reboot operation, run:

    auto-backup.sh

Notes:

  • This is only a temporary workaround and not a permanent fix. The above steps should be reverted once the issue has been identified and resolved fully.
  • If smart card authentication is enabled, disabling this will help stop any Likewise memory leaks created by the smart card authentication.

Option 3

If the above option is not valid, remove the host from active directory and stop the Likewise service on the ESXi host:

 /etc/init.d/lwsmd stop

If the lwsmd service is enabled to start automatically, update the service policy to "Start and stop manually."

 

Fix / Permanent Workaround

Fix is available in ESXi 8.0 Update 3e

See: VMware ESXi 8.0 Update 3e Release Notes

Additional Information

This issue is being checked by Diagnostics for VMware Cloud Foundation.

The check is as follows:

  • Product: VMware ESXi
  • Log File: vmkernel.log
  • Log Expression Check "MemSched" AND "Admission failure in the path: likewise/lwsmd\"

Attachments

lwis-mem-check-2.sh get_app