VMFS Heartbeat:
VMFS being a distributed file system relies on on-disk locks to arbitrate access to shared storage among multiple ESX hosts. VMFS also uses on-disk heartbeat mechanism to indicate liveness of hosts using the shared storage.
VMFS uses on-disk heartbeat (HB) mechanism to indicate liveness of hosts using the file system. All hosts using the shared storage uses ATS to update their HB in a given reason on the disk to indicate they are alive.
In reference to the above diagram:
Timeout associated with VMFS Heartbeat:
The various timeout associated with HB are
ATS and Heartbeat:
We use SCSI ATS a T10 command (Opcode 0x89, Compare and Write) to update HB. We strongly suggest customer to use ATS for HB as it is atomic and usage can scale. Drawback of not using ATS are
For ATS support, please check your back end array for ATS support.
Relation between on-disk locks and heartbeat slot in a VMFS volume:
As we have updated earlier VMFS being a distributed file system relies on on-disk locks to arbitrate access to shared storage among multiple ESX hosts. The On-disk locks are 1 sector in size and they immediately precede the metadata they are protecting. Some of the key contents of the on-disk locks are:
Now let’s take an example to understand relationship between on-disk lock and VMSF Heart Beat:
If a particular lock is needed by a host (say Host-A) and that lock is currently not free (as per lock mode field) and locked by another host (say Host B, as per lock owner field)
The HB address field enables Host-A to observe the HB slot of Host-B for a period of time, to determine if Host-B is alive or not.
If Host-A determines that Host-B is not alive (Its HB slot for Host-B has not changed for up to 16s), then Host-A can “clear” Host-B’s HB slot and break the lock it wanted and go ahead with its operation.