ESXi 6.0 / 6.5 host disconnects from vCenter Server due to a localcli process locking /etc/vmware/lunTimestamps.log
search cancel

ESXi 6.0 / 6.5 host disconnects from vCenter Server due to a localcli process locking /etc/vmware/lunTimestamps.log

book

Article ID: 320017

calendar_today

Updated On:

Products

VMware vCenter Server VMware vSphere ESXi

Issue/Introduction

This article provides steps to resolve a known issue with disconnected ESXi 6.0 hosts in vCenter Server.

Symptoms:
  • ESXi hosts appear as Not Responding in vCenter Server.
  • The issue persists after completing the work around in ESXi 6.0 host is disconnected from vCenter Server (2145106).
  • In the /var/run/log/hostd.log file, you see entries similar to:

    YYYY-MM-DD T10:09:00.493Z error hostd[39A80B70] [Originator@6876 sub=Hostsvc] Failed to fetch LUN data from VmkCtl: Error interacting with configuration file /etc/vmware/lunTimestamps.log: Timout while waiting for lock, /etc/vmware/lunTimestamps.log.LOCK, to be released. Another process has kept this file locked for more than 30 seconds. The process currently holding the lock is localcli(1186613). This is likely a temporary condition. Please try your operation again.
    YYYY-MM-DD T10:10:19.226Z info hostd[382C1B70] [Originator@6876 sub=ThreadPool] Thread enlisted
    YYYY-MM-DD T10:10:19.226Z info hostd[39E40B70] [Originator@6876 sub=ThreadPool] Thread enlisted
    YYYY-MM-DD T10:10:31.792Z error hostd[398C1B70] [Originator@6876 sub=Hostsvc] Failed to fetch LUN data from VmkCtl: Error interacting with configuration file /etc/vmware/lunTimestamps.log: Timout while waiting for lock, /etc/vmware/lunTimestamps.log.LOCK, to be released. Another process has kept this file locked for more than 30 seconds. The process currently holding the lock is
    localcli(1186613). This is likely a temporary condition. Please try your operation again.
     

    Note: This log excerpt is an example. Date, time, and environmental variables may vary depending on your environment.

     



Cause

This issue occurs due to a localcli command issued from a cron job causing a lock on the /etc/vmware/lunTimestamps.log file.
 
When a LUN is unmapped from storage array, the ESXi host retains its entry in esx.conf file. The localcli storage device purge command is ran from a cron job to remove stale LUN entries from esx.conf file.

Resolution

This is a known issue affecting ESXi 6.0.Issue fixed in 6.0 Patch 5 and 6.5 Patch 1 or later version

 

Caution: There will be no impact to the environment if LUNs are not getting created and unmapped frequently, as the cron job is removing stale (no longer used) LUN entries associated with the ESXi host. Large number of stale entries can cause delayed boot times.

Notes:

  • In some cases if an ESXi host is already in an unresponsive state, the host will need to be rebooted before the workaround can be applied.
  • The workaround does not persist through ESXi host reboots.

To work around this issue, disable the cronjob that runs the localcli storage device purge command:

 

  1. Log in to ESXi host through SSH and root credentials.

    Note: There will be no impact to the environment if you are not creating/unmapping a large cluster of LUNs periodically.
     
  2. Navigate to the /var/spool/cron/crontabs/ folder.
  3. Backup the root file with this command:

    cp root ./root.bak
     
  4. Open the root file using a text editor:

    vi root
     
  5. Comment the line:

    00 1 * * * localcli storage core device purge:


    The line should look similar to:

    #00 1 * * * localcli storage core device purge
     
  6. Save and close the file.
  7. Restart the crond service by running the following command:
 
cat /var/run/crond.pid

The value returned from the command will be the PID that you use for the command below

kill <PID>

/usr/lib/vmware/busybox/bin/busybox crond

 

The changes take effect immediately and the localcli commnad will be skipped at the next scheduled time.

 

Additional Information