"Lost access to volume" messages alert on vSAN clusters
search cancel

"Lost access to volume" messages alert on vSAN clusters

book

Article ID: 409706

calendar_today

Updated On:

Products

VMware vSAN

Issue/Introduction

  • "Lost access to volume" messages alert on 2-node vSAN cluster
    2015-07-02T02:00:11.675Z [4F1E1B70 info 'Vimsvc.ha-eventmgr'] Event 205 : Lost access to volume ########-########-####-####-####-############ (########-########-####-####-####-############) due to connectivity issues. Recovery attempt is in progress and outcome will be reported shortly.
    2015-07-02T02:00:37.055Z [4F480B70 info 'Vimsvc.ha-eventmgr'] Event 210 : Successfully restored access to volume ########-########-####-####-####-############ (########-########-####-####-####-############) following connectivity issues.
    
  • Directly Connected
  • Witness on Management network.
  • More than one 2-node cluster affected. 
  • Not seeing vmnic down
  • See heartbeat issues in logs

  • See the witness with latency and rejoining the cluster
    YYYY-MM-DDTHH:MM:SS.sssZ In(182) vmkernel: cpu#:#######)CMMDS: CMMDSStateMachineReceiveLoop:1654: #######-####-###-###-############: Error receiving from #######-####-###-###-############: Failure
    --
    YYYY-MM-DDTHH:MM:SS.sssZ In(182) vmkernel: cpu#:#######)CMMDSNet: CMMDSNetGrpMsgFilter:2623: #######-####-###-###-############: Creating node #######-####-###-###-############ from host unicast channel.
    --
    YYYY-MM-DDTHH:MM:SS.sssZ In(182) vmkernel: cpu#:#######)DOM: DOMOwnerSetupCreateProxyOwnerCompleteTask:25156: Task failed with status = 'Failure' for #######-####-###-###-############ retry #1
    YYYY-MM-DDTHH:MM:SS.sssZ In(182) vmkernel: cpu#:#######)CMMDS: LeaderUpdateMeanRTLatency:12423: Throttled: #######-####-###-###-############: High RT latency. Node #######-####-###-###-############ RT latency 8451(ms). Mean RT latency 2817(ms)
    YYYY-MM-DDTHH:MM:SS.sssZ In(182) vmkernel: cpu#:#######)DOM: DOMOwnerSetupCreateProxyOwnerCompleteTask:25156: Task failed with status = 'Failure' for #######-####-###-###-############ retry #1

Cause

  • Physical network issues outside of ESXi
    Examples include: Congestion on switch, routing issues between witness and data nodes resulting in slow RDT response times

Resolution

Engage your networking team and check the physical network for any network issues. If further VMware assistance is required open a case with our Networking team.

Additional Information

Please see: Lost access to volume" messages with vSAN for more details