"vSphere HA virtual machine failed to failover" error in vCenter Server

Products

VMware vCenter Server VMware vSphere ESXi

Issue/Introduction

This article provides information to:

Clear and reduce the occurrence of the error: vSphere HA virtual machine failed to failover

Symptoms:

In a cluster with an isolation response set to Leave powered on, when a host becomes isolated it may display the error:

vSphere HA virtual machine failed to failover
The virtual machine continues to run without a problem.

Environment

VMware vCenter Server 6.x
VMware vCenter Server 7.x
VMware vCenter Server 8.x

Cause

This behavior can occur whenever a High Availability primary agent declares a host dead. However, the virtual machines continue to run without incident. This alarm does not mean HA has failed or stopped working. When this alarm is triggered, one or more virtual machines failed to get powered on by a host in a cluster protected by HA.

Possible reasons for this to happen:

The host is still running but has disconnected from the network. The cluster's host isolation response is set to Leave powered on:
- When a host becomes network isolated, the remaining hosts in the cluster do not know if the host has crashed, or is simply disconnected from the network. As a result, the remaining hosts attempt to power up the virtual machines that were last logged as running on the isolated host. With Leave powered on, the host that became network isolated will leave the virtual machines up and running and not attempt to power them down, thus keeping the locks on the files. With the isolated host locking the files, the remaining hosts will fail to perform the power on the task on the virtual machines resulting in the alarm triggering.
The host is still running but has disconnected from the network. The cluster's host isolation response is set to Shut down or Power off:
- With this host isolation response, a host will attempt to send power off commands to its running virtual machines when it recognizes it is isolated. Once a virtual machine is completely shut down, and the original isolated host no longer has locks on the virtual machine's files, the remaining hosts in the cluster will be able to obtain the locks necessary to power up the virtual machines. If the virtual machine is not successfully shut down, or the locks are not released, then the alarm will be a triggered.
The host has failed and the virtual machine storage is in a degraded state. The remaining hosts in the cluster cannot contact the storage device and fail to power up the virtual machines, resulting in the alarm.

Resolution

This is expected behavior. The virtual machines continue to run without incident. This error can be safely ignored.

Workaround 1:
To clear the alarm from the virtual machine:

Acknowledge the alarm in the Monitor tab.
1. Select an inventory object in the object navigator.
2. Click the Monitor tab.
3. Click Issues and Alarms, and click Triggered Alarms.
4. Select an alarm and select Acknowledge.

For more information on dealing with alerts, see:

vCenter Server 7.x - the Acknowledge Triggered Alarms in the vSphere Client section in the document Acknowledge Triggered Alarms

To reduce the likelihood of this issue occurring:

Use multiple management networks.
Ensure the datastore heartbeats within the vCenter Server are communicating properly for HA to run efficiently when management network problems occur.

Workaround 2:

Login to the vCenter WebUI

Select the Inventory view
Select the Hosts and Clusters view
Select the affected cluster object > Configure > Services > vSphere Availability