Stopping the vmware-fdm service on an ESXi 5.0 host that is a HA primary node creates VM Failover attempts
search cancel

Stopping the vmware-fdm service on an ESXi 5.0 host that is a HA primary node creates VM Failover attempts

book

Article ID: 340206

calendar_today

Updated On:

Products

VMware vCenter Server VMware vSphere ESXi

Issue/Introduction

Symptoms:
  • When you stop the fdm service on an ESXi 5.0 host that is a HA primary node as a verification test for vSphere HA, vCenter Server reports that the HA Agent is unreachable

    Note: To stop the fdm service:
     
    1. Log in to the ESXi host locally or via SSH.
    2. Run this command:

      #/etc/init.d/vmware-fdm stop
       
  • You also get VM Failover attempts for virtual machines running on the original primary host
  • In the fdm.log file, you see entries similar to:

    2012-01-31T12:43:56.028Z [FFA45400 info 'Policy'] [VmOperationsManager::PerformPlacements] Sending a list of 1 VMs to the placement manager for placement.
    2012-01-31T12:43:56.028Z [FFA45400 info 'Placement'] [PlacementManagerImpl::AddVmToPlace] 1 Vms added, 1 VmRecord created
    2012-01-31T12:43:56.028Z [FFA45400 verbose 'Placement'] [PlacementManagerImpl::ReevaluateAllVms] Reevaluate all to-be-placed Vms.
    2012-01-31T12:43:56.028Z [FFA45400 verbose 'Placement'] [PlacementManagerImpl::IssuePlacementStartCompleteEventLocked] Issue failover start event
    2012-01-31T12:43:56.028Z [FFCB0B90 verbose 'FDM' opID=SWI-af68df9e] [FdmService] New event: EventEx=com.vmware.vc.HA.ClusterFailoverActionInitiatedEvent vm= host= tag=host-2186:528254229:2


Environment

VMware vSphere ESXi 5.0
VMware vCenter Server 5.0.x

Cause

This is an expected behavior. The new primary host begins monitoring all protected virtual machines to determine if they are still running, or if they need to be restarted. As the secondary nodes connect to the new primary host, they report the virtual machines that are running on the host. The new primary host removes these virtual machines from the list that it is still monitoring. Ten seconds after the new primary host is elected, it checks its list of monitored virtual machines and, if there are any, the primary host attempts to restart them. Because the fdm service was turned off on the host that was previously the primary host, it never becomes secondary node. This means that new primary host does not receive a report from that host with virtual machines running on it. As such, the new primary host attempts to fail over all protected virtual machines from the previous primary host.

Notes:
  • If the fdm service is stopped on a secondary host, no new election for primary takes place and there is no new primary host. Therefore, the process of placing all the protected virtual machines in a list to monitor is not performed and there will be no attempt to fail over any virtual machines.
  • If the fdm service is restarted, there will be an election. However, if there already is a primary node, the host connects as secondary node and there will be no attempt to fail over any virtual machines.
  • Normally, services running in ESXi 5.0 are monitored by a watchdog. If, during normal operation, the fdm process halts, the watchdog restarts the process and a new election begins. If this happens on a primary node, the restart should complete quickly enough that no failover attempts occur.

Resolution

To run a valid verification test for vSphere 5 VMware HA, halt the fdm process, instead of stopping the fdm service.