vSAN Daemon Liveness Alert with EPD Status Showing as Abnormal

Products

VMware vSAN

Issue/Introduction

A vSAN daemon liveness alert was triggered on an ESXi host, with the EPD (Entry Persistence Daemon) status showing as Abnormal.

NOTE: EPD (Entry Persistence Daemon) is a user space daemon that runs on every host that is part of the vSAN cluster. The main job of EPD is to make sure there is no component leakage when objects are deleted.

Changes:

The RAID controller and battery were replaced prior to the issue.

Observations:

Skyline Health indicates that the EPD service is not running.
Restarting the EPD service fails with the following messages:

[root@ESXI:~] /etc/init.d/epd restart

epd is not running
No such pool: epd
INIT: Failed to clear epd memory reservation.
INIT: EPD uses a ramdisk for the db file
INIT: No persistent storage found to backup the DB into.

Scratch Configuration:

vim-cmd hostsvc/advopt/view ScratchConfig.ConfiguredScratchLocation

Configured Scratch Location: /vmfs/volumes/...

vim-cmd hostsvc/advopt/view ScratchConfig. CurrentScratchLocation

Current Scratch Location: /tmp/_osdata...

The local datastore normally used for scratch and system storage is missing.
Disk Access Check:
- Attempts to read or access local disks return “No such file or directory,” indicating the local storage device is not detected by the system.

hexdump -C /vmfs/devices/disks/naa.#####

partedUtil getptbl /vmfs/devices/disks/naa.#####

Reboot didn't fixed the issue.

Environment

VMware vSAN 8.x

VMware vSAN 7.x

Cause

The ESXi host has a scratch partition configured, but the local device used for persistent storage is not accessible.

After replacing the RAID controller, the controller’s NVRAM configuration (which stores RAID metadata) was lost. Although the physical drives retain the old metadata, the new controller does not automatically import or recognize it.

As a result, the logical drive that contains the ESXi OS and local datastore is detected as Unconfigured, making the datastore unavailable and preventing the EPD service from initializing properly.

Validation

Log entries from /var/log/epd.log confirm that the EPD service could not locate persistent storage for its database files:

INIT: /scratch is not yet mounted (attempt #1) .
INIT: EPD uses a ramdisk for the db file
INIT: No persistent storage found to backup the DB into.
INIT: Failed to clear epd memory reservation.
INIT: EPD uses a ramdisk for the db file
INIT: No persistent storage found to backup the DB into.
INIT: Failed to clear epd memory reservation.
INIT: EPD uses a ramdisk for the db file
INIT: No persistent storage found to backup the DB into.
INIT: Failed to clear epd memory reservation.
INIT: EPD uses a ramdisk for the db file
INIT: No persistent storage found to backup the DB into.
INIT: Failed to clear epd memory reservation.
INIT: EPD uses a ramdisk for the db file
INIT: No persistent storage found to backup the DB into.
INIT: EPD uses a ramdisk for the db file
INIT: No persistent storage found to backup the DB into.

Additionally, HPE iLO reports the logical drive configuration as Unconfigured, indicating the local datastore device is not detected by the controller.

Resolution

Engage the hardware vendor to verify and restore the RAID configuration.

Once the RAID configuration is successfully restored, the local datastore becomes accessible again, allowing the EPD service to start normally.

Additional Information

The missing logical drive prevents ESXi from mounting the scratch partition, which the EPD service depends on for storing its database.

Restoring the RAID configuration resolves the persistent storage issue and restores normal vSAN service operation.