"vSAN daemon liveness" alert on skyline health
search cancel

"vSAN daemon liveness" alert on skyline health

book

Article ID: 390966

calendar_today

Updated On:

Products

VMware vSAN

Issue/Introduction

Symptoms:

  • Skyline health reported error: vSAN daemon Liveness




  • When click on Troubleshoot button, the EPD status reports as 'Abnormal' as below.


     
  • From host command line when checking the status, epd service shows as not running. This can be verified using the command below.

    [root@localhost:-) /etc/init.d/cmmdsd status && /etc/init.d/epd status && /etc/init.d/clomd status && /etc/init.d/cmmdsTimeMachine status && /etc/init.d/osfsd status
    cmmdsd is running
    epd is not running

Environment

VMware vSAN 7.x

Cause

  • When you start the EPD service on host manually, it may error out saying no persistent storage.

     root@esxi:~] /etc/init.d/epd status
     epd is not running
     [root@esxi:~] /etc/init.d/epd restart
     epd is not running
     INIT: EPD uses a ramdisk for the db file
     INIT: Using /locker as persistent storage
     INIT: Using existing EPD ramdisk at /epd.
     INIT: EPD using Security domain:  ID:

     (or)

[root@localhost:-) /etc/init.d/cmmdsd restart && /etc/init.d/epd restart && /etc/init.d/clomd restart && /etc/init.d/cmmdsTimeMachine restart && /etc/init.d/osfsd restart
watchdog-cmmdsd[2182118]: Terminating watchdog process with PID 2098436
Waiting for process to terminate...
cmmdsd stopped
cmmdsd started
epd is not running
No such pool: epd
INIT: Failed to clear epd memory reservation.
INIT: EPD uses a ramdisk for the db file
INIT: No persistent storage found to back up the DB into.

         NOTE: EPD (Entry Persistence Daemon) is a user space daemon that runs on every host that is part of the vSAN cluster. The main job of EPD is to make sure there is no component leakage when objects are deleted.

This indicates that host does not have persistent storage configured on ESXi host which is a requirement.

epd.log in path /var/log/epd.log. Noticed that the persistent storage is missing.

2025-03-12T18:36:06.943Z In(30) epd[2098765]: INIT: EPD uses a ramdisk for the db file
2025-03-12T18:36:06.952Z In(30) epd[2098768]: INIT: No persistent storage found to backup the DB into.
2025-03-13T09:03:37.933Z In(30) epd[2182159]: INIT: Failed to clear epd memory reservation.
2025-03-13T09:03:37.946Z In(30) epd[2182163]: INIT: EPD uses a ramdisk for the db file
2025-03-13T09:03:37.955Z In(30) epd[2182166]: INIT: No persistent storage found to backup the DB into.
2025-03-13T09:09:38.560Z In(30) epd[2182768]: INIT: Failed to clear epd memory reservation.
2025-03-13T09:09:38.570Z In(30) epd[2182772]: INIT: EPD uses a ramdisk for the db file
2025-03-13T09:09:38.576Z In(30) epd[2182775]: INIT: No persistent storage found to backup the DB into.

  • Below errors observed in epd.log
     
       2025-07-21T09:44:00.267Z No(13) epd[3024588]: EntryDBPrintDBFileInfo: Using db file '/scratch/epd-storeV2.db'
       2025-07-21T09:44:00.268Z No(13) epd[3024588]: EntryDB_Open: Failed to open db '/scratch/epd-storeV2.db' : unable to open database file (14)
       2025-07-21T09:44:00.268Z No(13) epd[3024588]: EPDStoreOpen: Failed to open db (/scratch/epd-storeV2.db): Failure
       2025-07-21T09:44:00.268Z No(13) epd[3024588]: EPDModuleInit: init for store-mgmt failed: Failure
       2025-07-21T09:44:00.268Z No(13) epd[3024588]: main: initialization failed: Failure
       2025-07-21T09:44:00.268Z No(13) epd[3024588]: main: exiting..
       2025-07-21T09:44:00.268Z No(13) epd[3024588]: SRV: module exit: misc-init
       2025-07-21T09:44:00.268Z No(13) epd[3024588]: SRV: module exit: socket-init
       2025-07-21T09:44:00.268Z No(13) epd[3024588]: EPDSockExit: Closing socket

Resolution

To resolve the issue, create a persistent scratch partition on ESXi host and reboot the ESXi host after putting into maintenance mode with ensureAccessibility.

To configure persistent scratch location for the host, please follow: Creating a persistent scratch location for ESXi 8.x/7.x/6.x

Reboot the host after fixing the persistent scratch partition (Refer the KB above) and then perform retest skyline health from vCenter. 

Additional Information