vSAN Health Service - Cluster Health – vSAN daemon liveness check
search cancel

vSAN Health Service - Cluster Health – vSAN daemon liveness check

book

Article ID: 318410

calendar_today

Updated On:

Products

VMware vSAN

Issue/Introduction

This article explains the Cluster Health – CLOMD liveness check in the vSAN Health Service and provides details on why it might report an error.

CLOMD (Cluster Level Object Manager Daemon) plays a key role in the operation of a vSAN cluster. It runs on every ESXi host and is responsible for new object creation, initiating repair of existing objects after failures, all types of data moves and evacuations (For example: Enter Maintenance Mode, Evacuate data on disk removal from vSAN, maintaining balance and thus triggering rebalancing, implementing policy changes, etc.)

It does not actually participate in the data path, but it triggers data path operations and as such is a critical component during a number of management workflows and failure handling scenarios.

Virtual machine power on, or Storage vMotion to vSAN are two operations where CLOMD is required (and which are not that obvious), as those operations require the creation of a swap object, and object creation requires CLOMD.

Similarly, starting with vSAN 6.0, memory snapshots are maintained as objects, so taking a snapshot with a memory state will also require the CLOMD.

EPD (Entry Persistence Daemon) is a user space daemon that runs on every host that is part of the vSAN cluster. The main job of EPD is to make sure there is no component leakage when objects are deleted.

CMMDSD (Cluster Monitoring, Membership, and Directory Service Daemon) is a daemon to persist CMMDS (Cluster Monitoring, Membership and Directory Service) directory contents. It loads CMMDS user world process and provides an interface to CMMDS. CMMDS is responsible for monitoring the links to the cluster and acts as a primary distribution fabric for cluster metadata. It is also responsible for maintaining the state of cluster health and network links. Other modules use this information to know which nodes are part of the cluster and also which are the healthy interfaces for these nodes.

As of version 8.0U2 OSFSD & CMMDSTIMEMACHINED have been added to this health check. For versions prior to 8.0U2, you won't see this in the health check.

OSFSD (Object Store File System Daemon) is a daemon running on ESXi host that provides a distributed file system for storing and managing virtual machine data. It enables vSAN to offer file services in addition to its primary storage functionality.

CMMDSTIMEMACHINED (Cluster Monitoring, Membership, and Directory Service Time Machine Daemon) is a daemon running on ESXi host that is responsible for maintaining historical metadata records for object versions. It enables the vSAN cluster to recover from metadata inconsistencies by providing a way to roll back metadata changes to a previous version.


Environment

VMware vSAN 8.0.x
VMware vSAN 6.x
VMware vSAN 7.0.x

Resolution

Q: What does the “Cluster health – vSAN daemon liveness check (Former: vSAN CLOMD liveness check)” check do?

It checks if CLOMD, EPD, CMMDSD, OSFSD (added in 8.0 U2), and CMMDSTIMEMACHINED (added in 8.0 U2) are alive or not. For CLOMD, it does so by first checking that the service is running on all ESXi hosts, and then contacting the service to retrieve run-time statistics to verify that CLOMD can respond to inquiries. For EPD, CMMDSD, OSFSD, and CMMDSTIMEMACHINED, it checks whether the service is running properly on all ESXi hosts.

Note: This does not ensure that all of the functionalities discussed above (For example: Object creation, rebalancing) actually work, but it gives a first level assessment as to the health of CLOMD, EPD, CMMDSD, OSFSD, and CMMDSTIMEMACHINED services.

Q: What does it mean when it is in an error state?

vSAN daemons may still have issues, but this test does a very basic check to make sure that they are still running. If this reports an error, the state of the CLOMD, EPD, CMMDSD, OSFSD, and CMMDSTIMEMACHINED service(s) is not working as expected and needs to be checked on the relevant ESXi host.

A good way to further probe into CLOMD health is to perform a virtual machine creation test (Proactive tests), as this involves object creation that will exercise and test CLOMD thoroughly.
For more information about this issue, refer to the following article: CLOM Daemon Liveness Check

Q: How does one troubleshoot and fix the error state?

For standard clusters, all services CLOMD, EPD, CMMDSD, OSFSD, and CMMDSTIMEMACHINED should be running on all nodes in the cluster.

For stretched clusters and metadata clusters, see the below table whether this service is expected to be running or not for the respected node:
 
  Data node of stretched cluster Witness node of stretched cluster Data node of metadata cluster Metadata node of metadata cluster
CLOMD Yes No Yes Yes
EPD Yes No Yes No
CMMDSD Yes Yes Yes Yes
OSFSD Yes No Yes No
cmmdsTimemachined Yes No Yes No

The unchecked daemon status of the ESXi host is shown as “--".

If CLOMD, EPD, CMMDSD, OSFSD, and CMMDSTIMEMACHINED service(s) is not running on a particular ESXi host, then the CLOMD, EPD, CMMDSD, OSFSD, and CMMDSTIMEMACHINED service(s) status of that host is Abnormal.

For this test to succeed, the health service needs to be installed on the ESXi host and the CLOMD, EPD, CMMDSD, OSFSD, and CMMDSTIMEMACHINED services need to be running. To get the status of CLOMD, EPD, CMMDSD, OSFSD, and CMMDSTIMEMACHINED service on the ESXi host, run this command:


/etc/init.d/cmmdsd status && /etc/init.d/epd status && /etc/init.d/clomd status && /etc/init.d/cmmdsTimeMachine status && /etc/init.d/osfsd status


If the daemon is not running, try to run the restart command on the ESXi host:


/etc/init.d/cmmdsd restart && /etc/init.d/epd restart && /etc/init.d/clomd restart && /etc/init.d/cmmdsTimeMachine restart && /etc/init.d/osfsd restart

If the vSAN daemon liveness check is still failing after these steps or if the vSAN daemon liveness check continues to fail on a regular basis, open a support request with VMware Support. For more information, see How to file a Support Request in Customer Connect (2006985) .

Additional Information

For more information on collecting VMware vSAN Logs, see Collecting vSAN support logs and uploading to VMware (2072796).

Also, see: vSAN Health Service - Cluster Health - vSAN Health Service up-do-date
vSAN Health Service - Cluster Health - Advanced vSAN configuration in sync
vSAN Health Service - Network Health - Hosts disconnected from vCenter Server
vSAN Health Service - Network Health - Unexpected vSAN cluster members
vSAN Health Service - Network Health - vSAN Cluster Partition
vSAN Health Service - Network Health – Hosts with vSAN disabled
vSAN Health Service - Network Health - All hosts have a vSAN vmknic configured
vSAN Health Service - Network Health - All hosts have matching subnets
vSAN Health Service - Network Health - Hosts small ping test (connectivity check) and Hosts large ping test (MTU check)
vSAN Health Service - Network Health - Hosts with connectivity issues
vSAN Health Service - Data Health – vSAN Object Health
vSAN Health Service - Physical Disk Health - Metadata Health
vSAN Health Service - Physical Disk Health - Overall Disk Health
vSAN Health Service - Limits Health – Current Cluster Situation
vSAN Health Service - Limits Health – After one additional host failure
vSAN Health Service - Physical Disk Health - Disk Capacity
vSAN Health Service – Physical Disk Health – Component Metadata Health
vSAN Health Service - Physical Disk Health – Congestion
vSAN Health Service - Physical Disk Health – Memory pools
vSAN Health Service - vSAN HCL Health - Controller Release Support
vSAN Health Service – vSAN HCL Health – Controller Driver
vSAN Health Service - vSAN HCL Health – vSAN HCL DB up-to-date
vSAN Health Service - vSAN HCL Health – SCSI Controller on vSAN HCL
vSAN Health Check Information