"vSphere HA virtual machine failover failed" alarm may be triggered for a powered off VM when powering on/off VM in a short interval
search cancel

"vSphere HA virtual machine failover failed" alarm may be triggered for a powered off VM when powering on/off VM in a short interval

book

Article ID: 420330

calendar_today

Updated On:

Products

VMware vCenter Server

Issue/Introduction

  • The "vSphere HA virtual machine failover failed" alarm is displayed for a powered off VM

  • The vCenter event log shows

    Insufficient resources to fail over {vm.name}. vSphere HA will retry the fail over when enough resources are available.


  • The affected VM was powered on/off in a short interval

  • The /var/run/log/fdm.log on the primary vSphere HA host contains the entries similar to:

    YYYY-mm-ddTHH:MM:SS.XXXZ warning fdm[<pid>] [Originator@6876 sub=Invt opID=<opID>] VM /vmfs/volumes/{datastore}/{vm}/{vm}.vmx isn't in compat dictionary, return empty compat set
    YYYY-mm-ddTHH:MM:SS.XXXZ verbose fdm[<pid>] [Originator@6876 sub=Placement opID=<opID>] Vm /vmfs/volumes/{datastore}/{vm}/{vm}.vmx failed placement with fault [N3Vim5Fault21NoActiveHostInClusterE:0x000000XXXXXXXXXX]
    YYYY-mm-ddTHH:MM:SS.XXXZ verbose fdm[<pid>] [Originator@6876 sub=Placement opID=<opID>] Setting insufficient resource timeout to 60 seconds and retrying placement for vm /vmfs/volumes/{datastore}/{vm}/{vm}.vmx
    YYYY-mm-ddTHH:MM:SS.XXXZ verbose fdm[<pid>] [Originator@6876 sub=Placement opID=<opID>] Post resource request event to VC for Vm /vmfs/volumes/{datastore}/{vm}/{vm}.vmx (reason=vim.fault.NoActiveHostInCluster)
    YYYY-mm-ddTHH:MM:SS.XXXZ verbose fdm[<pid>] [Originator@6876 sub=FDM opID=<opID>] New event: Event=vim.event.NotEnoughResourcesToStartVmEvent vm=/vmfs/volumes/{datastore}/{vm}/{vm}.vmx host=host-X tag=host-X:XXXXXXXXXX:XXX

  • The fdm.log also shows that the VM's protection and unprotection events occur almost simultaneously:


    YYYY-mm-ddTHH:MM:SS.XXXZ verbose fdm[<pid>] [Originator@6876 sub=Cluster opID=<opID>] Unprotecting vm /vmfs/volumes/{datastore}/{vm}/{vm}.vmx @ X
    YYYY-mm-ddTHH:MM:SS.XXXZ verbose fdm[<pid>] [Originator@6876 sub=Cluster opID=<opID>] unprotected vm at index X
    YYYY-mm-ddTHH:MM:SS.XXXZ verbose fdm[<pid>] [Originator@6876 sub=Invt opID=<opID>] vm /vmfs/volumes/{datastore}/{vm}/{vm}.vmx from __localhost__ changed inventory  desireProtected=0
    YYYY-mm-ddTHH:MM:SS.XXXZ verbose fdm[<pid>] [Originator@6876 sub=Invt opID=<opID>] vm /vmfs/volumes/{datastore}/{vm}/{vm}.vmx from __localhost__ changed inventory  actualProtected=1 desireProtected=1
    YYYY-mm-ddTHH:MM:SS.XXXZ verbose fdm[<pid>] [Originator@6876 sub=PropertyProvider opID=<opID>] RecordOp ADD: protectedVm["/vmfs/volumes/{datastore}/{vm}/{vm}.vmx"], fdmService. Applied change to temp map.
    YYYY-mm-ddTHH:MM:SS.XXXZ verbose fdm[<pid>] [Originator@6876 sub=Policy opID=<opID>] VM ID /vmfs/volumes/{datastore}/{vm}/{vm}.vmx: Transitioned to 'MonitorPower' state

Cause

This issue occurs due to a race condition between the protection and unprotection workflows in vSphere HA.

Resolution

A fix is planned for a future release.

To clear the false vSphere HA alarms, disable and re-enable vSphere HA

  1. Select the cluster in vSphere Client.
  2. Configure > Services > vSphere Availability
  3. Edit
  4. Toggle off vSphere HA > OK
  5. Edit
  6. Toggle on vSphere HA > OK