"Lost access to volume" messages with vSAN

Products

VMware vSAN

Issue/Introduction

To inform about the meaning of "Lost access to volume" messages on vSAN clusters.

Symptoms:

In the /var/log/hostd.log file, you see entries similar to:

2015-07-02T02:00:11.675Z [4F1E1B70 info 'Vimsvc.ha-eventmgr'] Event 205 : Lost access to volume ########-########-####-####-####-############ (########-########-####-####-####-############) due to connectivity issues. Recovery attempt is in progress and outcome will be reported shortly.
2015-07-02T02:00:37.055Z [4F480B70 info 'Vimsvc.ha-eventmgr'] Event 210 : Successfully restored access to volume ########-########-####-####-####-############ (########-########-####-####-####-############) following connectivity issues.
In vCenter Server, you see an event similar to:

Lost access to volume ########-########-####-####-####-############ (########-########-####-####-####-############) due to connectivity issues. Recovery attempt is in progress and outcome will be reported shortly.

In this instance, shown ID ########-########-####-####-####-############ is not referring to a volume/datastore, but to the affected object on the vSAN datastore. These messages are similar to a VMFS volumes when there is too high latency or interruptions when accessing the data.
Below events will be seen in vsantracesUrgent.log at the time of Lost access to volume errors from the collected ESXI host logs.

Use the vsanTraceReader command to read a vSAN trace file : /usr/lib/vmware/vsan/bin/vsanTraceReader vsantracesUrgent*.gz

vsantracesUrgent--2025-05-15T06h09m########.zst__extracted-by-vsanTraceReader.log:2025-05-16T09:22:56.202055 [49060475] [cpu1] [c3e46b79 CLIENT readWriteAtsWithBlkAttr5 NSIO] DOMTraceOpTookTooLong:11385: {'op': 0x45ba5c428c00, 'objUuid': '6bd3f767-6abc-8d9c-####-###', 'offset-39': 3473408, 'length-25': 512, 'totalTimeMS': 12199, 'timeInThisPhaseMS': 12199, 'opPhase': 'Wait for RDT'}
vsantracesUrgent--2025-05-15T06h09m58s822--#####.zst__extracted-by-vsanTraceReader.log:2025-05-16T09:22:56.202078 [49060477] [cpu1] [c3e46b75 CLIENT readWriteAtsWithBlkAttr5 NSIO] DOMTraceOpTookTooLong:11385: {'op': 0x45ba5c4dfd80, 'objUuid': 'c1ec7c66-###-5c7b-###-1423f2a####', 'offset-39': 3473408, 'length-25': 512, 'totalTimeMS': 12199, 'timeInThisPhaseMS': 12199, 'opPhase': 'Wait for RDT'}
vsantracesUrgent--2025-05-15T06h09m########.zst__extracted-by-vsanTraceReader.log:2025-05-16T09:22:56.211579 [49060479] [cpu1] [c3e46b73 CLIENT readWriteAtsWithBlkAttr5 NSIO] DOMTraceOpTookTooLong:11385: {'op': 0x45ba5c5a9ec0, 'objUuid': 'cbcd7a65-####-06e4-###-1423f###', 'offset-39': 3473408, 'length-25': 512, 'totalTimeMS': 12208, 'timeInThisPhaseMS': 12208, 'opPhase': 'Wait for RDT'}
vsantracesUrgent--2025-05-15T06h09m######.zst__extracted-by-vsanTraceReader.log:2025-05-16T09:22:58.479701 [49060498] [cpu6] [c3e46b7c CLIENT readWriteAtsWithBlkAttr5 NSIO] DOMTraceOpTookTooLong:11385: {'op': 0x45ba5c4679c0, 'objUuid': '033b9e65-###-###-9bce-1423f###', 'offset-39': 3473408, 'length-25': 512, 'totalTimeMS': 14476, 'timeInThisPhaseMS': 14476, 'opPhase': 'Wait for RDT'}

Environment

VMware vSAN 8.x
VMware NSX

Cause

Each ESXi host is writing and processing periodic heartbeats to its VMFS filesystems. When there are issues updating heartbeats due to underlying vSAN storage instability, these heartbeats will timeout and "Lost access to volume" is logged in the vmkernel.log file.
On vSAN-enabled clusters the folders of virtual machines (called "VM Namespace") are using a special form of VMFS filesystems to incorporate required files for virtual machines, like .vmx configuration file, .vmdk descriptor files, vmware.log files, etc.

This vSAN cluster instability can, but not limited to, cause by any misbehaving disks in the entire vSAN environment (on any node), network latency (like NSX issues like BGP down condition or TEP unhealth state) instability of the vSAN network, any sort of congestion, etc.

Resolution

There are multiple possible reasons when such issues are observed. Validate following:

The "vSAN health check" must be run to verify the state of vSAN and validate whether any issues exist. As an administrator it must be run and checked regularly.
All virtual machines must be compliant with applied storage policy.
Verify a stable, performant network connectivity on the vSAN network.
Verify if any disks on the vSAN environment are nearing it's end of lifespan (e.g. "Write Endurance" for any disks)

Note: Sub-second recovery events during disk group removal are non-impactful logical events.

If issue still exist, open a support case with Broadcom.

Additional Information

Understanding lost access to volume messages in ESXi 6.x/7.x

Attachments

POLICY_COMPLIANCE_CHECK_4.PNG get_app

POLICY_COMPLIANCE_2.PNG get_app

POLICY_COMPLIANCE_3.PNG get_app

"Lost access to volume" messages with vSAN

Article ID: 326400

Updated On:

Products

Issue/Introduction

To inform about the meaning of "Lost access to volume" messages on vSAN clusters.

Symptoms:

Environment

Cause

Resolution

Additional Information

Attachments

Feedback

"Lost access to volume" messages with vSAN

Article ID: 326400

Updated On:

Products

Issue/Introduction

To inform about the meaning of "Lost access to volume" messages on vSAN clusters.Symptoms:

Environment

Cause

Resolution

Additional Information

Attachments

Feedback

To inform about the meaning of "Lost access to volume" messages on vSAN clusters.

Symptoms: