ESXi Hosts Become Unresponsive with Repeated "Corrupt Heartbeat Detected" Events
search cancel

ESXi Hosts Become Unresponsive with Repeated "Corrupt Heartbeat Detected" Events

book

Article ID: 420505

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

Symptoms:

  • Recurring "Corrupt heartbeat detected" events are observed for a specific VMFS volume. These events can be seen under Datastore > Monitor > Events tab

  • All ESXi hosts on which the impacted datastore was mounted, report a status of "Not Responding" 

  • Under the Monitor > Events tab of the affected ESXi host, recurring "Corrupt heartbeat detected" alerts associated with the impacted datastore is observed.

  • An analysis of the vmkernel.log file (accessible via the Direct Console User Interface/DCUI) reveals repeated error messages indicating volume corruption

     "Volume ########-########-####-############ may be damaged on disk. Corrupt heartbeat detected:"

  • After rebooting the affected ESXi hosts to restore connectivity, the original VMFS datastore that reported the errors is no longer mounted or visible.

  • Instead of the original datastore, a new VMFS datastore is mounted. This new datastore is backed by the exact same storage device that was previously backing the original, missing volume.

Environment

VMware ESXi 8.x
VMware ESX 9.x

Cause

The storage device hosting the volume was inadvertently formatted for an ESXi OS installation. This action overwrote the VMFS metadata and the active heartbeat region. The detection of this corrupted heartbeat caused the ESXi host to lose connectivity with the storage subsystem effectively, leading to the host becoming unresponsive in vCenter Server. Since the data has been overwritten, the original datastore is permanently missing post-reboot.

Cause Validation:

  • Execute the command esxcli storage vmfs extent list. The output will indicate that the original datastore reporting errors is missing and a new datastore is mounted on the same storage device (naa.id).

  • Verify the partition table structure of the affected device using the partedUtil command. The output displays a partition table structure characteristic of an ESXi boot disk (containing systemPartition, linuxNative, and vmkDiagnostic) rather than a standard VMFS volume layout.

partedUtil getptbl /vmfs/devices/disks/naa.###############
gpt
534698 255 63 8589934592
1 64 8191 C12A7328F81F11D2BA4B00A0C93EC93B systemPartition 128
5 8224 520191 EBD0A0A2B9E5443387C068B6B72699C7 linuxNative 0
6 520224 1032191 EBD0A0A2B9E5443387C068B6B72699C7 linuxNative 0
7 1032224 1257471 9D27538040AD11DBBF97000C2911D1B8 vmkDiagnostic 0
8 1257504 1843199 EBD0A0A2B9E5443387C068B6B72699C7 linuxNative 0
9 1843200 7086079 9D27538040AD11DBBF97000C2911D1B8 vmkDiagnostic 0
2 7086080 15472639 EBD0A0A2B9E5443387C068B6B72699C7 linuxNative 0
3 15472640 8589934558 AA31E02A400F11DB9590000C2911D1B8 vmfs 0

  • /var/run/log/vmkernel.log entries indicate the device was deleted while active, followed by "Address temporarily unmapped" errors. This confirms the partition was removed while I/O operations were pending.

2025-12-02T00:06:52.001Z cpu12:167545 opID=3ebb65e7)LVM: 13413: Deleting device <naa.###############:1>dev OpenCount: 1, postRescan: False
2025-12-02T00:06:56.942Z cpu9:68829)Vol3: 3210: Failed to get object 28 type 2 uuid 68f93f9f-xxxxxxxx-xxxx-xxxxxxxxxxxx FD 4 gen 1 :Address temporarily unmapped

  • Check the creation time of the new datastore found on the device using the below command

vmkfstools -Ph -v10 /vmfs/volumes/<new_datastore>

VMFS-5.81 (Raw Major Version: 14) file system spanning 1 partitions.
File system label (if any): <new_datastore>
Mode: public ATS-only
Capacity 4389993447424 (4186624 file blocks * 1048576), 4388953260032 (4185632 blocks) avail, max supported file size 69201586814976
Volume Creation Time: Tue Dec  2 00:06:11 2025
Files (max/free): 130000/129992
Ptr Blocks (max/free): 64512/64496
Sub Blocks (max/free): 32000/32000
Secondary Ptr Blocks (max/free): 256/256
File Blocks (overcommit/used/overcommit %): 0/992/0
Ptr Blocks  (overcommit/used/overcommit %): 0/16/0
Sub Blocks  (overcommit/used/overcommit %): 0/0/0
Volume Metadata size: 825131008
UUID: 692e2d73-xxxxxxxx-xxxx-001122xxxxxx
Logical device: 692e2d6c-xxxxxxxx-xxxx-xxxxxxxxxxxx
Partitions spanned (on "lvm"):
        naa.###############:3
Is Native Snapshot Capable: YES
OBJLIB-LIB: ObjLib cleanup done.
WORKER: asyncOps=0 maxActiveOps=0 maxPending=0 maxCompleted=0

The Volume Creation Time of the new datastore will match the timestamp of the "Deleting device" entry in the vmkernel.log

  • To identify which specific host performed the formatting/installation, examine the UUID of the new volume from the above output.

The last segment of the UUID typically represents the MAC address of the physical network adapter on the host where the volume was created.

Match this MAC address (00:11:22:xx:xx:xx) against the inventory to identify the host where the accidental installation occurred.

Resolution

As the storage device was formatted and the ESXi operating system was installed over the existing partition, the original data has been permanently overwritten. Consequently, recovery of the original VMFS volume from the device is not possible.

To resolve this issue, the affected virtual machines must be restored from a valid backup.