Skyline Health alert "vSAN daemon liveness" reports status of EPD service as 'Abnormal' despite /scratch location being available to ESXi host
search cancel

Skyline Health alert "vSAN daemon liveness" reports status of EPD service as 'Abnormal' despite /scratch location being available to ESXi host

book

Article ID: 398843

calendar_today

Updated On:

Products

VMware vSAN VMware vSAN 7.x VMware vSAN 8.x

Issue/Introduction

Issue may manifest if the epd-storeV2.db.lock directory is present in /scratch location despite EPD daemon not running. 
When encountered, /var/run/log/epd.log will display messages similar to the below:

2025-05-25T17:56:16.130Z 2129851 warning epd EntryDBCheckLockFile: -- XXX: Lock file '/scratch/epd-storeV2.db.lock' exists.
2025-05-25T17:56:16.130Z 2129851 warning epd EntryDBCheckLockFile: -- XXX: Did the host or EPD recently crash?
2025-05-25T17:56:16.130Z 2129851 warning epd EntryDBCheckLockFile: -- XXX: Assuming it's OK. Unlinking lock file. 
2025-05-25T17:56:16.130Z 2129851 warning epd EntryDBCheckLockFile: -- XXX: Lock file is a directory.
2025-05-25T17:56:16.530Z 2129851 warning epd EntryDBCheckLockFile: Failed to delete lock file: Input/output error
2025-05-25T17:56:16.530Z 2129851 warning epd EPDStoreOpen: Failed to open db (/scratch/epd-storeV2.db): Failure
2025-05-25T17:56:16.530Z 2129851 warning epd EPDModuleInit: init for store-mgmt failed: Failure
2025-05-25T17:56:16.530Z 2129851 error epd main: initialization failed: Failure
2025-05-25T17:56:16.530Z 2129851 warning epd main: exiting..
2025-05-25T17:56:16.530Z 2129851 SRV: module exit: misc-init
2025-05-25T17:56:16.530Z 2129851 SRV: module exit: socket-init
2025-05-25T17:56:16.530Z 2129851 info epd EPDSockExit: Closing socket.
2025-05-25T17:56:16.530Z 2129851 SRV: module exit: log
2025-05-25T17:56:16.530Z 2129851 info epd EPDLogExit: exiting.

Environment

vSAN ESA / vSAN OSA

Cause

Issue is caused by presence of leftover epd-storeV2.db.lock directory, usually a result of transient /scratch location unavailability and/or 'unclean' termination of ESXi services (power failure in datacenter / power off of ESXi host initiated via ILO/iDRAC, etc). 

 

Resolution

Issue can be resolved by manual delete of leftover lock file. To achieve that, user should log into ESXi host as root using SSH and:

1. Navigate to /scratch location:

cd /scratch

2. Confirm with that the location is not full/out of space:

df -h . 

Example output:

[root@hostname:/vmfs/volumes/654cdac0-########-####-############] df -h .
Filesystem   Size   Used Available Use% Mounted on
VFFS       119.8G  13.3G    106.4G  11% /vmfs/volumes/OSDATA-########-####-############
[root@hostname:/vmfs/volumes/654cdac0-########-####-############] 

3. Verify manually that EPD service is stopped:

/etc/init.d/epd status

Output should display:

epd is not running

4. Create temporary folder:

mkdir /scratch/backup_epd

5. Move epd-storeV2.db.lock to newly created folder:

mv /scratch/epd-storeV2.db.lock /scratch/backup_epd

6. Start the service with:

/etc/init.d/epd start

7. Validate that service has started/is running:

/etc/init.d/epd status

8. Upon confirming that everything is running as expected, user may choose to delete the previously made backup directory:

rm -rf /scratch/backup_epd