To determine which virtual machines are restarted during a vSphere HA failover and on which hosts they reside:
- Review vCenter Server events
Review vCenter Server events to confirm that a vSphere HA failover has occurred.
To review vCenter Server events:
- In vSphere Client, click the Tasks & Events tab.
- Click Events.
- Select Show cluster entries in the dropdown.
- Search for events with vSphere HA in the description.
In the event of a vSphere HA failover, you see messages similar to:
- vSphere HA initiated a failover action
- vSphere HA initiated a virtual machine failover action in cluster
In the event of vSphere HA restarting a virtual machine, you see messages similar to:
- vSphere HA restarted a virtual machine
- vSphere HA restarted virtual machine vm_name on host hostname
Note: This event is issued for each individual virtual machine that is restarted by vSphere HA.
- Review FDM logs on master and slave hosts
The FDM logs provide more detailed information about which virtual machines are restarted during a vSphere HA failover and on which hosts they reside.
The master host in a cluster has a number of responsibilities, including monitoring the state of slave hosts. If a slave host fails or becomes unreachable, the master host identifies which virtual machine(s) need to be restarted.
To review fdm.log files on the affected master and slave hosts:
Note: You can check the vSphere HA host configuration in vSphere Client. Click to select the vSphere HA cluster, click the Host tab. The vSphere HA state column indicates if the host is a master or a slave.
- Log in to the master and slave hosts as the root user.
- Navigate to the /var/log/ fdm.log files.
- Review the fdm.log file on the master host.
- The FDM logs on the master host-406 show that three hosts have failed. You see output similar to:
T12:28:13.895Z [48CC2B90 info 'Invt' opID=SWI-2c257561] [HostStateChange::SaveToInventory] host host-1208 changed state: Dead
T12:28:13.897Z [48CC2B90 info 'Invt' opID=SWI-2c257561] [HostStateChange::SaveToInventory] host host-1214 changed state: Dead
T12:28:13.899Z [48CC2B90 info 'Invt' opID=SWI-2c257561] [HostStateChange::SaveToInventory] host host-409 changed state: Dead
Note: hostIds differ from hostnames you specify in your environment. For more information on mapping between the hostname and the hostId, see How to determine the mapping between hostname and hostId in a VMware HA cluster (2037000).
- Due to the Dead state, a failover process is initiated. You see output similar to:
T12:28:13.899Z [48A81B90 verbose 'Placement' opID=SWI-6e28e512] [PlacementManagerImpl::IssuePlacementStartCompleteEventLocked] Issue failover start event
T12:28:13.899Z [48ECAB90 verbose 'FDM' opID=SWI-f650f8e7] [FdmService] New event: EventEx=com.vmware.vc.HA.ClusterFailoverActionInitiatedEvent vm= host= tag=host-406:-1688405459:2
- Another responsibility of the master host is restarting virtual machines when a host in the HA cluster fails. The master sends the command to start the affected virtual machines on the other live hosts (host-1220, host-1201 and localhost). You see output similar to:
T12:28:13.905Z [48D03B90 verbose 'Execution' opID=SWI-c2da38ba] [ExecutionManagerImpl::ConstructAndDispatchCommands] Place /vmfs/volumes/<datastore_uuid>/vm001/vm001.vmx on host-1220 (cmd ID host-406:0)
T12:28:13.905Z [48D03B90 verbose 'Execution' opID=SWI-c2da38ba] [ExecutionManagerImpl::ConstructAndDispatchCommands] Place /vmfs/volumes/<datastore_uuid>/vm002/vm002.vmx on __localhost__ (cmd ID host-406:1)
T12:28:13.911Z [48E89B90 verbose 'Execution' opID=SWI-298b2a17] [ExecutionManagerImpl::ConstructAndDispatchCommands] Place /vmfs/volumes/<datastore_uuid>/vm003/vm003.vmx on host-1201 (cmd ID host-406:2)
T12:28:13.912Z [48E89B90 verbose 'Execution' opID=SWI-298b2a17] [ExecutionManagerImpl::ConstructAndDispatchCommands] Place /vmfs/volumes/<datastore_uuid>/vm004/vm004.vmx on __localhost__ (cmd ID host-406:3)
T12:28:13.912Z [48E89B90 verbose 'Execution' opID=SWI-298b2a17] [ExecutionManagerImpl::ConstructAndDispatchCommands] Place /vmfs/volumes/<datastore_uuid>/vm005/vm005.vmx on host-1201 (cmd ID host-406:2)
T12:28:13.912Z [48E89B90 verbose 'Execution' opID=SWI-298b2a17] [ExecutionManagerImpl::ConstructAndDispatchCommands] Place /vmfs/volumes/<datastore_uuid>/vm006/vm006.vmx on host-1220 (cmd ID host-406:4)
- VmRestartedByHAEvent is logged by the master for virtual machines that the master restarts. This allows you to determine which virtual machines are restarted. You see output similar to:
T12:28:39.009Z [48A81B90 verbose 'FDM' opID=host-406:3-0-SWI-99e10af9] [FdmService] New event: EventEx=com.vmware.vc.ha.VmRestartedByHAEvent vm=/vmfs/volumes/<datastore_uuid>/vm004/vm004.vmx host=host-406 tag=host-406:-1688405459:4
T12:28:39.142Z [48A81B90 verbose 'FDM' opID=host-406:1-0-SWI-8c2d5fa7] [FdmService] New event: EventEx=com.vmware.vc.ha.VmRestartedByHAEvent vm=/vmfs/volumes/<datastore_uuid>/vm002/vm002.vmx host=host-406 tag=host-406:-1688405459:5
- VmRestartedByHAEvent messages received by the master for any virtual machines restarted by other hosts in the cluster are also visible. You see output similar to:
T12:28:54.917Z [48D85B90 verbose 'FDM'] [EventManagerImpl] Received event from host-1220 (10.13.26.221)
T12:28:54.917Z [48D85B90 verbose 'FDM'] [FdmService] New event: EventEx=com.vmware.vc.ha.VmRestartedByHAEvent vm=/vmfs/volumes/<datastore_uuid>/vm006/vm006.vmx host=host-1220 tag=host-1220:-511551180:3
T12:28:54.992Z [48C81B90 verbose 'FDM'] [EventManagerImpl] Received event from host-1220 (10.13.26.221)
T12:28:54.992Z [48C81B90 verbose 'FDM'] [FdmService] New event: EventEx=com.vmware.vc.ha.VmRestartedByHAEvent vm=/vmfs/volumes/<datastore_uuid>/vm001/vm001.vmx host=host-1220 tag=host-1220:-511551180:4
T12:28:55.657Z [48D85B90 verbose 'FDM'] [EventManagerImpl] Received event from host-1201 (10.13.26.223)
T12:28:55.658Z [48D85B90 verbose 'FDM'] [FdmService] New event: EventEx=com.vmware.vc.ha.VmRestartedByHAEvent vm=/vmfs/volumes/<datastore_uuid>/vm005/vm005.vmx host=host-1201 tag=host-1201:2017435893:3
T12:28:55.663Z [48E07B90 verbose 'FDM'] [EventManagerImpl] Received event from host-1201 (10.13.26.223)
T12:28:55.663Z [48E07B90 verbose 'FDM'] [FdmService] New event: EventEx=com.vmware.vc.ha.VmRestartedByHAEvent vm=/vmfs/volumes/<datastore_uuid>/vm003/vm003.vmx host=host-1201 tag=host-1201:2017435893:4
- Review the fdm.log on the slave host.
- This output is an example of the logging seen on a slave host (host-1220) when it has received the instruction to restart the virtual machine (vm006), from the master (host-406):
T12:28:13.902Z [FFA32400 verbose 'Execution'] [ActionScheduler::AddPendingActionInt] Added pending action opId = host-406:4-0, cfgPath = /vmfs/volumes/<datastore_uuid>/vm006/vm006.vmx , type = VmFailover, priority = 32
T12:28:13.902Z [FFA32400 verbose 'Execution'] [VmPlacementActionScheduler::ExecuteActions] Execute action opId = host-406:4-0 for /vmfs/volumes/<datastore_uuid>/vm006/vm006.vmx
T12:28:13.902Z [41A79B90 info 'Default' opID=host-406:4-0] [VpxLRO] -- BEGIN task-internal-30 -- -- CommandActionLRO --
T12:28:13.902Z [41A79B90 verbose 'Execution' opID=host-406:4-0] [FailoverAction::StartAsync] Failing over vm /vmfs/volumes/<datastore_uuid>/vm006/vm006.vmx (isRegistered=false)
T12:28:13.902Z [41A79B90 verbose 'Execution' opID=host-406:4-0] [FailoverAction::StartAsync] Registering vm
T12:28:13.902Z [41A79B90 verbose 'Default' opID=host-406:4-0] [TaskInfoPublisher::IncListeners] host:[] Return after successful IncListeners (listeners = 2, _connectionGen = 0)
T12:28:13.902Z [41A79B90 verbose 'Default' opID=host-406:4-0] [TaskInfoListener::TaskInfoListener] constructed. Connection number = 0
T12:28:13.902Z [41A79B90 verbose 'Hal' opID=host-406:4-0] [FdmHalHost::RegisterVmAsync] Invoking registerVm on hostd
T12:28:13.906Z [41A79B90 verbose 'Default' opID=host-406:4-0] TaskInfoChannel created for haTask-ha-folder-vm-vim.Folder.registerVm-266957111
T12:28:13.906Z [41A79B90 verbose 'Default' opID=host-406:4-0] [TaskInfoPublisher::AddChannel] host:[] Channel (haTask-ha-folder-vm-vim.Folder.registerVm-266957111) added for task: haTask-ha-folder-vm-vim.Folder.registerVm-266957111
T12:28:13.906Z [41A79B90 verbose 'Default' opID=host-406:4-0] [TaskInfoChannel::GetTaskInfo] task: haTask-ha-folder-vm-vim.Folder.registerVm-266957111 setup for async notification
T12:28:54.442Z [FFF07B90 verbose 'Default' opID=host-406:4-0] [VpxLRO] Task task-internal-30 has been resumed
T12:28:54.442Z [FFF07B90 verbose 'Execution' opID=host-406:4-0] [FailoverAction::RegisterCompletionCallback] Registering vm done (vmid=/vmfs/volumes/<datastore_uuid>/vm006/vm006.vmx , hostdVmId=63)
T12:28:54.442Z [FFF07B90 verbose 'Execution' opID=host-406:4-0] [FailoverAction::InitiateReconfigure] Not reconfiguring vm
T12:28:54.442Z [FFF07B90 verbose 'Execution' opID=host-406:4-0] [FailoverAction::ReconfigureCompletionCallback] Reconfiguring vm /vmfs/volumes/<datastore_uuid>/vm006/vm006.vmx
is done
T12:28:54.442Z [FFF07B90 verbose 'Execution' opID=host-406:4-0] [FailoverAction::ReconfigureCompletionCallback] Powering on vm
T12:28:54.442Z [FFF07B90 verbose 'Default' opID=host-406:4-0] [TaskInfoPublisher::IncListeners] host:[] Return after successful IncListeners (listeners = 2, _connectionGen = 0)
T12:28:54.442Z [FFF07B90 verbose 'Default' opID=host-406:4-0] [TaskInfoListener::TaskInfoListener] constructed. Connection number = 0
T12:28:54.444Z [FFF07B90 verbose 'Default' opID=host-406:4-0] TaskInfoChannel created for haTask-63-vim.VirtualMachine.powerOn-266957123
T12:28:54.444Z [FFF07B90 verbose 'Default' opID=host-406:4-0] [TaskInfoPublisher::AddChannel] host:[] Channel (haTask-63-vim.VirtualMachine.powerOn-266957123) added for task: haTask-63-vim.VirtualMachine.powerOn-266957123
T12:28:54.444Z [FFF07B90 verbose 'Default' opID=host-406:4-0] [TaskInfoChannel::GetTaskInfo] task: haTask-63-vim.VirtualMachine.powerOn-266957123 setup for async notification
T12:28:54.899Z [41A38B90 verbose 'Default' opID=host-406:4-0] [VpxLRO] Task task-internal-30 has been resumed
T12:28:54.899Z [41A38B90 verbose 'Execution' opID=host-406:4-0] [FailoverAction::PowerOnCompletionCallback] Power on vm /vmfs/volumes/<datastore_uuid>/vm006/vm006.vmx done
T12:28:54.899Z [41A38B90 verbose 'Execution' opID=host-406:4-0] [ActionScheduler::RemoveAction] Action is removed: opId = host-406:4-0
T12:28:54.899Z [41A38B90 info 'Default' opID=host-406:4-0] [VpxLRO] -- FINISH task-internal-30 -- -- CommandActionLRO --
T12:28:54.900Z [FFE44B90 verbose 'FDM' opID=host-406:4-0-SWI-d2634fc5] [FdmService] New event: EventEx=com.vmware.vc.ha.VmRestartedByHAEvent vm=/vmfs/volumes/<datastore_uuid>/vm006/vm006.vmx host=host-1220 tag=host-1220:-511551180:3
T12:28:54.900Z [FFE44B90 verbose 'PropertyProvider' opID=host-406:4-0-SWI-d2634fc5] RecordOp ADD: event[3], fdmService
T12:28:54.900Z [FFE44B90 verbose 'PropertyProvider' opID=host-406:4-0-SWI-d2634fc5] RecordOp ASSIGN: serverTime, fdmService
T12:29:04.901Z [41AFBB90 verbose 'Execution' opID=host-406:4-0] [ExecutionManagerImpl::SendUpdateToMaster] Sent command update to master