"Lost access to volume" messages alert on vSAN clusters

search cancel

"Lost access to volume" messages alert on vSAN clusters

book

Article ID: 409706

calendar_today

Updated On:

Products

VMware vSAN

Issue/Introduction

"Lost access to volume" messages alert on 2-node vSAN cluster

2015-07-02T02:00:11.675Z [4F1E1B70 info 'Vimsvc.ha-eventmgr'] Event 205 : Lost access to volume ########-########-####-####-####-############ (########-########-####-####-####-############) due to connectivity issues. Recovery attempt is in progress and outcome will be reported shortly.
2015-07-02T02:00:37.055Z [4F480B70 info 'Vimsvc.ha-eventmgr'] Event 210 : Successfully restored access to volume ########-########-####-####-####-############ (########-########-####-####-####-############) following connectivity issues.

Directly Connected
Witness on Management network.
More than one 2-node cluster affected.
Not seeing vmnic down
See heartbeat issues in logs

See the witness with latency and rejoining the cluster

YYYY-MM-DDTHH:MM:SS.sssZ In(182) vmkernel: cpu#:#######)CMMDS: CMMDSStateMachineReceiveLoop:1654: #######-####-###-###-############: Error receiving from #######-####-###-###-############: Failure
--
YYYY-MM-DDTHH:MM:SS.sssZ In(182) vmkernel: cpu#:#######)CMMDSNet: CMMDSNetGrpMsgFilter:2623: #######-####-###-###-############: Creating node #######-####-###-###-############ from host unicast channel.
--
YYYY-MM-DDTHH:MM:SS.sssZ In(182) vmkernel: cpu#:#######)DOM: DOMOwnerSetupCreateProxyOwnerCompleteTask:25156: Task failed with status = 'Failure' for #######-####-###-###-############ retry #1
YYYY-MM-DDTHH:MM:SS.sssZ In(182) vmkernel: cpu#:#######)CMMDS: LeaderUpdateMeanRTLatency:12423: Throttled: #######-####-###-###-############: High RT latency. Node #######-####-###-###-############ RT latency 8451(ms). Mean RT latency 2817(ms)
YYYY-MM-DDTHH:MM:SS.sssZ In(182) vmkernel: cpu#:#######)DOM: DOMOwnerSetupCreateProxyOwnerCompleteTask:25156: Task failed with status = 'Failure' for #######-####-###-###-############ retry #1

Cause

Physical network issues outside of ESXi
Examples include: Congestion on switch, routing issues between witness and data nodes resulting in slow RDT response times

Resolution

Engage your networking team and check the physical network for any network issues. If further VMware assistance is required open a case with our Networking team.

Additional Information

Please see: Lost access to volume" messages with vSAN for more details

Feedback

thumb_up Yes

thumb_down No