This KB explains briefly about VMFS Heart Beat (HB), different timeouts associated with Heartbeat and how to use HB to acquire a lock on certain occasions.
Resolution
VMFS Heartbeat:
VMFS being a distributed file system relies on on-disk locks to arbitrate access to shared storage among multiple ESX hosts. VMFS also uses on-disk heartbeat mechanism to indicate liveness of hosts using the shared storage.
VMFS uses on-disk heartbeat (HB) mechanism to indicate liveness of hosts using the file system. All hosts using the shared storage uses ATS to update their HB in a given reason on the disk to indicate they are alive.
In reference to the above diagram:
Every host using a VMFS volume has its own heartbeat slot (1 sector in size) and they update it in the “Heartbeat Region” as shown above.
Among other things, the important information in each heartbeat slot are:
HB state (Slot not-in-use, Slot in active use, Slot is being replayed)
Generation – monotonically increasing number. Changes when HB state changes
Timestamp – Updated on every periodic heartbeat IO
The HB workflow for a given host goes through following states
Acquire HB Slot - A HB slot ( a sector in size) is chosen by the host in the Heartbeat Region to heart beat.
Periodic HB update – This is done using ATS primarily to show the host is alive. In case the HB slot is not updated for a considerable time (> 16 sec) then we assume the host to be dead and start reclaim/recovery process.
HB Reclaim/Recovery – Recovery is initiated on HB if timeout (16s) expires or we see ATS mis-compare. In recovery case we halt all outstanding commands on the device. Rest of the recovery process depends on how the HB slot has changed, e.g
If no other host has cleared or used this HB slot, we start HB normally again
If another host has cleared the HB and HB generation has increased by 1, we start HB normally again
If another host is in process of replaying this HB slot, wait for replay to complete
Clear HB Slot – Clear the HB slot when host exits.
Timeout associated with VMFS Heartbeat:
The various timeout associated with HB are
Periodic heartbeat interval (3s) – Every interval a HB ATS is issued.
HB IO timeout (8s) – Timeout associated with the ATS command for periodic heartbeat update.
HB lease timeout (16s) – Disk lock(s) considered stale if owning host HB is not updated for this duration.
ATS and Heartbeat:
We use SCSI ATS a T10 command (Opcode 0x89, Compare and Write) to update HB. We strongly suggest customer to use ATS for HB as it is atomic and usage can scale. Drawback of not using ATS are
Without ATS, HB update is a simple write without reservation.
Delayed writes pose a problem when a HB IO completes much after the timeout (at the device) and potentially overwrites a newer HB update from another host.
For ATS support, please check your back end array for ATS support.
Relation between on-disk locks and heartbeat slot in a VMFS volume:
As we have updated earlier VMFS being a distributed file system relies on on-disk locks to arbitrate access to shared storage among multiple ESX hosts. The On-disk locks are 1 sector in size and they immediately precede the metadata they are protecting. Some of the key contents of the on-disk locks are:
Lock type – Identifies the type of metadata this lock is protecting
Lock mode – Identifies whether a lock is free or whether its locked in exclusive/read-only, etc mode
Lock owner – UUID of the host which currently owns the lock
HB address – Address of the HB slot of the owner
Now let’s take an example to understand relationship between on-disk lock and VMSF Heart Beat:
If a particular lock is needed by a host (say Host-A) and that lock is currently not free (as per lock mode field) and locked by another host (say Host B, as per lock owner field) The HB address field enables Host-A to observe the HB slot of Host-B for a period of time, to determine if Host-B is alive or not. If Host-A determines that Host-B is not alive (Its HB slot for Host-B has not changed for up to 16s), then Host-A can “clear” Host-B’s HB slot and break the lock it wanted and go ahead with its operation.