HA failover over of VMs on an ESXi host with "Lost connectivity to datastore" warnings

search cancel

HA failover over of VMs on an ESXi host with "Lost connectivity to datastore" warnings

book

Article ID: 407578

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

A HA failover is triggered on VMs on an ESXi host
The HA master node reports lost access to the host

e.g. /var/log/fdm.log on HA master node:
Fdm[2108683] [Originator@6876 sub=Cluster opID=clusterManager.cpp:980-601c5b31] Marking slave host-####### as unreachable
The impacted host stops logging for a number of minutes at the time of the failover. (All logs under /var/log show a gap in logging)
Prior to stopping logging, the ESXi host does not report datastore heartbeat timeout
On resuming logging, the ESXi host reports datastore heartbeat timeout, and then recovers heartbeat within one to two seconds:

/var/log/vobd.log:
vobd[2098147] [vmfsCorrelator] 391182884031us: [vob.vmfs.heartbeat.timedout] <Datastore UUID> <Datastore Name>
vobd[2098147] [vmfsCorrelator] 391184193077us: [vob.vmfs.heartbeat.recovered] Reclaimed heartbeat for volume <Datastore UUID> (<Datastore Name>): [Timeout] [HB state abcdef02 offset 3837952 gen 81 stampUS 391184176867 uuid <UUID> jrnl <FB 25165830> drv 24.82]

Environment

VMware vSphere ESXi (all versions)

Cause

The ESXi host becomes unresponsive. The heartbeat timeouts and lost access to datastores are a consequence of the host unresponsiveness.

HA failover is triggered due to the host becoming unresponsive, confirmed by the datastore heartbeat timeouts.

An underlying hardware issue is the likely cause.

Resolution

Engage the hardware vendor.

Feedback

thumb_up Yes

thumb_down No