Re-occurring instances of random vSAN cluster hosts going into a "Not Responding" state

search cancel

Re-occurring instances of random vSAN cluster hosts going into a "Not Responding" state

book

Article ID: 416608

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

The hostd service is showing excessive latency on several nodes of the cluster at a time in hostd.log, always referencing a single namespace object UUID

2025-09-10T06:32:30.419Z Wa(164) Hostd[20534933]: [Originator@6876 sub=IoTracker] In thread 20534943, unlink("/vmfs/volumes/vsan:############-####################/########-####-####-####-############/########-####-####-####-############/2024-10-17/vsantracesIODiag--2024-10-17T23h02m28s155--########-####-####-####-############.zst") took over 42464 sec.

Hosts are also frequently failing to heartbeat a single namespace object UUID in vmkernel.log

2025-08-19T01:40:48.243Z In(182) vmkernel: cpu38:10082088 opID=22330eae)HBX: 6695: '########-####-####-####-############': HB at offset 3289088 - Skipping replay as HB is being replayed by another live host:

The namespace object UUID path attribute or directory name is .vsan.trace which is used as the native vsantraces logging location

Environment

8.0

Cause

Corruption of the DOM object heartbeat region, leading to lock contention and heartbeat failures

Resolution

Set a local vsantraces logging location and restart vsanmgmtd & vsantraced services across all nodes

# esxcli vsan trace set -p <path-to-local-datastore>
# /etc/init.d/vsanmgmtd restart 
# /etc/init.d/vsantraced restart

If the issue isn't resolved after repointing vsantraces logging, engage the vSAN support team to determine the scope and nature of the vsantraces namespace corruption and apply an appropriate workaround:

Contact Broadcom VMware Support

Feedback

thumb_up Yes

thumb_down No