Performing a "Reconfigure for VMware HA" operation on the primary ESXi node in an HA cluster triggers false "unexpected virtual machine failover" alerts to be generated

search cancel

Performing a "Reconfigure for VMware HA" operation on the primary ESXi node in an HA cluster triggers false "unexpected virtual machine failover" alerts to be generated

book

Article ID: 318969

calendar_today

Updated On:

Products

VMware vCenter Server

Issue/Introduction

When performing a "Reconfigure for VMware HA" operation on the primary node in an HA cluster, an unexpected virtual machine failover alert is triggered for the virtual machines running on that primary node.
The vCenter Server events tab displays messages similar to:

vCenter Server is disconnected from a master HA agent running on host <primary hostname> in HA_DRS_Cluster in Datacenter - vSphere HA agent on <primary hostname> in cluster HA_DRS_Cluster in Datacenter is disabled

The vSphere HA availability state of the host <primary hostname> in cluster in HA_DRS_Cluster in Datacenter has changed to Uninitialized

The vSphere HA availability state of the host <secondary hostname> in cluster in HA_DRS_Cluster in Datacenter has changed to Election

vSphere HA unsuccessfully failed over <virtual machine> on <secondary hostname> in cluster HA_DRS_Cluster in Datacenter. vSphere HA will retry if the maximum number of attempts has not been exceeded. Reason: The operation is not allowed in the current state.
Other descriptions of this issue include:

'We have a persistent vSphere HA failover notice of "vSphere HA initiated a failover action on ## virtual machines in cluster HA_DRS_Cluster" '

Environment

VMware vCenter Server

Cause

When the primary HA host is manually reconfigured for HA, it causes the remaining secondary hosts to enter an election to find a new primary host.

The newly elected primary host places the virtual machines running on the old primary host in an unknown power state and waits up to 10 seconds for a notification that the virtual machines on the old primary host are powered on and running.

If the old primary host does not become secondary within that 10-second interval, the new primary host assumes that the virtual machines are down and attempts to restart them. This causes a false failover event to occur, and consequently, the failover task fails because the virtual machines were never powered off. The virtual machines remain unaffected in this scenario.

Resolution

This behavior is an expected cosmetic issue. The alerts can be ignored and cleared once the HA configuration has finished resetting.

To avoid generating alerts, increase the monitor period (but this is usually unnecessary).

Note: The exact property name varies by vCenter version. Use the appropriate option name from the table below based on your vCenter version:

To apply the setting to clear the warning, perform the following steps:

In vCenter, right-click the cluster and select Edit Settings.
Click vSphere HA and then Advanced Options.
If not already present, add a new line with the option text and value, or change the existing Option to the value below:

vCenter version	Option	Value
8.0 Update 2 onwards	`das.config.fdm.policy.unknownStateMonitorPeriod`	30
7.0 Update 1 to 8.0 Update1	`das.config.fdm.unknownStateMonitorPeriod`	30
Pre-7.0 Update 1	`das.config.fdm.policy.unknownStateMonitorPeriod`	30

Disable and re-enable the HA settings of the cluster.

Additional Information

プライマリノードで VMware HA 操作の再構築を実行すると、予期しない仮想マシンのフェイルオーバーが発生する
|
Impact/Risks:
Increasing the monitor period also increases the time to start virtual machine failovers by the same amount (in this case, by 20 seconds) when a primary node stops during an actual HA failure.

Feedback

thumb_up Yes

thumb_down No