"vSphere HA virtual machine failover failed" alarm may be triggered for a powered off VM when powering on/off VM in a short interval

search cancel

"vSphere HA virtual machine failover failed" alarm may be triggered for a powered off VM when powering on/off VM in a short interval

book

Article ID: 420330

calendar_today

Updated On:

Products

VMware vCenter Server

Issue/Introduction

The "vSphere HA virtual machine failover failed" alarm is displayed for a powered off VM
The vCenter event log shows

Insufficient resources to fail over {vm.name}. vSphere HA will retry the fail over when enough resources are available.
The affected VM was powered on/off in a short interval
The /var/run/log/fdm.log on the primary vSphere HA host contains the entries similar to:

YYYY-mm-ddTHH:MM:SS.XXXZ warning fdm[<pid>] [Originator@6876 sub=Invt opID=<opID>] VM /vmfs/volumes/{datastore}/{vm}/{vm}.vmx isn't in compat dictionary, return empty compat set
YYYY-mm-ddTHH:MM:SS.XXXZ verbose fdm[<pid>] [Originator@6876 sub=Placement opID=<opID>] Vm /vmfs/volumes/{datastore}/{vm}/{vm}.vmx failed placement with fault [N3Vim5Fault21NoActiveHostInClusterE:0x000000XXXXXXXXXX]
YYYY-mm-ddTHH:MM:SS.XXXZ verbose fdm[<pid>] [Originator@6876 sub=Placement opID=<opID>] Setting insufficient resource timeout to 60 seconds and retrying placement for vm /vmfs/volumes/{datastore}/{vm}/{vm}.vmx
YYYY-mm-ddTHH:MM:SS.XXXZ verbose fdm[<pid>] [Originator@6876 sub=Placement opID=<opID>] Post resource request event to VC for Vm /vmfs/volumes/{datastore}/{vm}/{vm}.vmx (reason=vim.fault.NoActiveHostInCluster)
YYYY-mm-ddTHH:MM:SS.XXXZ verbose fdm[<pid>] [Originator@6876 sub=FDM opID=<opID>] New event: Event=vim.event.NotEnoughResourcesToStartVmEvent vm=/vmfs/volumes/{datastore}/{vm}/{vm}.vmx host=host-X tag=host-X:XXXXXXXXXX:XXX
The fdm.log also shows that the VM's protection and unprotection events occur almost simultaneously:

YYYY-mm-ddTHH:MM:SS.XXXZ verbose fdm[<pid>] [Originator@6876 sub=Cluster opID=<opID>] Unprotecting vm /vmfs/volumes/{datastore}/{vm}/{vm}.vmx @ X
YYYY-mm-ddTHH:MM:SS.XXXZ verbose fdm[<pid>] [Originator@6876 sub=Cluster opID=<opID>] unprotected vm at index X
YYYY-mm-ddTHH:MM:SS.XXXZ verbose fdm[<pid>] [Originator@6876 sub=Invt opID=<opID>] vm /vmfs/volumes/{datastore}/{vm}/{vm}.vmx from __localhost__ changed inventory desireProtected=0
YYYY-mm-ddTHH:MM:SS.XXXZ verbose fdm[<pid>] [Originator@6876 sub=Invt opID=<opID>] vm /vmfs/volumes/{datastore}/{vm}/{vm}.vmx from __localhost__ changed inventory actualProtected=1 desireProtected=1
YYYY-mm-ddTHH:MM:SS.XXXZ verbose fdm[<pid>] [Originator@6876 sub=PropertyProvider opID=<opID>] RecordOp ADD: protectedVm["/vmfs/volumes/{datastore}/{vm}/{vm}.vmx"], fdmService. Applied change to temp map.
YYYY-mm-ddTHH:MM:SS.XXXZ verbose fdm[<pid>] [Originator@6876 sub=Policy opID=<opID>] VM ID /vmfs/volumes/{datastore}/{vm}/{vm}.vmx: Transitioned to 'MonitorPower' state

Cause

This issue occurs due to a race condition between the protection and unprotection workflows in vSphere HA.

Resolution

A fix is planned for a future release.

To clear the false vSphere HA alarms, disable and re-enable vSphere HA

Select the cluster in vSphere Client.
Configure > Services > vSphere Availability
Edit
Toggle off vSphere HA > OK
Edit
Toggle on vSphere HA > OK

Feedback

thumb_up Yes

thumb_down No