VMFS Heartbeat:VMFS being a distributed file system relies on on-disk locks to arbitrate access to shared storage among multiple ESX hosts. VMFS also uses on-disk heartbeat mechanism to indicate
liveness of hosts using the shared storage.
VMFS uses on-disk heartbeat (HB) mechanism to indicate liveness of hosts using the file system. All hosts using the shared storage uses ATS to update their HB in a given reason on the disk to indicate they are alive.
In reference to the above diagram:
- Every host using a VMFS volume has its own heartbeat slot (1 sector in size) and they update it in the “Heartbeat Region” as shown above.
- Among other things, the important information in each heartbeat slot are:
- HB state (Slot not-in-use, Slot in active use, Slot is being replayed)
- Generation – monotonically increasing number. Changes when HB state changes
- Timestamp – Updated on every periodic heartbeat IO
- The HB workflow for a given host goes through following states
- Acquire HB Slot - A HB slot ( a sector in size) is chosen by the host in the Heartbeat Region to heart beat.
- Periodic HB update – This is done using ATS primarily to show the host is alive. In case the HB slot is not updated for a considerable time (> 16 sec) then we assume the host to be dead and start reclaim/recovery process.
- HB Reclaim/Recovery – Recovery is initiated on HB if timeout (16s) expires or we see ATS miscompare. In recovery case we halt all outstanding commands on the device. Rest of the recovery process depends on how the HB slot has changed, e.g
- If no other host has cleared or used this HB slot, we start HB normally again
- If another host has cleared the HB and HB generation has increased by 1, we start HB normally again
- If another host is in process of replaying this HB slot, wait for replay to complete
- Clear HB Slot – Clear the HB slot when host exits.
Timeout associated with VMFS Heartbeat:The various timeout associated with HB are
- Periodic heartbeat interval (3s) – Every interval a HB ATS is issued.
- HB IO timeout (8s) – Timeout associated with the ATS command for periodic heartbeat update.
- HB lease timeout (16s) – Disk lock(s) considered stale if owning host HB is not updated for this duration.
ATS and Heartbeat:We use SCSI ATS a T10 command (Opcode 0x89, Compare and Write) to update HB. We strongly suggest customer to use ATS for HB as it is atomic and usage can scale. Drawback of not using ATS are
- Without ATS, HB update is a simple write without reservation.
- Delayed writes pose a problem when a HB IO completes much after the timeout (at the device) and potentially overwrites a newer HB update from another host.
For ATS support, please check your back end array for ATS support.
Relation between on-disk locks and heartbeat slot in a VMFS volume:As we have updated earlier VMFS being a distributed file system relies on on-disk locks to arbitrate access to shared storage among multiple ESX hosts. The On-disk locks are 1 sector in size and they immediately precede the metadata they are protecting. Some of the key contents of the on-disk locks are:
- Lock type – Identifies the type of metadata this lock is protecting
- Lock mode – Identifies whether a lock is free or whether its locked in exclusive/read-only, etc mode
- Lock owner – UUID of the host which currently owns the lock
- HB address – Address of the HB slot of the owner
Now let’s take an example to understand relationship between on-disk lock and VMSF Heart Beat:
If a particular lock is needed by a host (say HostA) and that lock is currently not free (as per lock mode field) and locked by another host (say Host B, as per lock owner field)
The HB address field enables HostA to observe the HB slot of HostB for a period of time, to determine if HostB is alive or not.
If HostA determines that HostB is not alive (Its HB slot for HostB has not changed for upto 16s), then HostA can “clear” HostB’s HB slot and break the lock it wanted and go ahead with its operation.