To determine which virtual machines are restarted during a vSphere HA failover and on which hosts they reside:
- Review vCenter Server events
OR
Review vCenter Server events to confirm that a vSphere HA failover has occurred.
To review vCenter Server events:
- In vSphere Client, click the Tasks & Events tab.
- Click Events.
- Select Show cluster entries in the dropdown.
- Search for events with vSphere HA in the description.
In the event of a vSphere HA failover, you see messages similar to:
- vSphere HA initiated a failover action
- vSphere HA initiated a virtual machine failover action in cluster
In the event of vSphere HA restarting a virtual machine, you see messages similar to:
- vSphere HA restarted a virtual machine
- vSphere HA restarted virtual machine vm_name on host hostname
Note: This event is issued for each individual virtual machine that is restarted by vSphere HA.
Incase tasks and events were not collected or cannot be viewed, another alternative is to review journalctl logs. This will capture all the tasks and events from the vCenter as well.
In event of vSphere HA restarting a virtual machine, you see messages similar to:
-
- vSphere HA restarted a virtual machine
- vSphere HA restarted virtual machine vm_name on host hostname
The FDM logs provide more detailed information about which virtual machines are restarted during a vSphere HA failover and on which hosts they reside.
The primary host in a cluster has a number of responsibilities, including monitoring the state of secondary hosts. If a secondary host fails or becomes unreachable, the primary host identifies which virtual machine(s) need to be restarted.
To review fdm.log files on the affected primary and secondary hosts:
Note: You can check the vSphere HA host configuration in vSphere Client. Click to select the vSphere HA cluster, click the Host tab. The vSphere HA state column indicates if the host is a primary or a secondary.
- Log in to the primary and secondary hosts as the root user.
- Navigate to the /var/log/ fdm.log files.
- Review the fdm.log file on the primary host.
- The FDM logs on the primary host-XXXX show that three hosts have failed. You see output similar to:
YYYY:MM:DDTHH:MM:SSZ [48CC2B90 info 'Invt' opID=<OPID>] [HostStateChange::SaveToInventory] host host-#### changed state: Dead
YYYY:MM:DDTHH:MM:SSZ [48CC2B90 info 'Invt' opID=<OPID>] [HostStateChange::SaveToInventory] host host-#### changed state: Dead
YYYY:MM:DDTHH:MM:SSZ [48CC2B90 info 'Invt' opID=<OPID>] [HostStateChange::SaveToInventory] host host-#### changed state: Dead
Note: hostIds differ from hostnames you specify in your environment. For more information on mapping between the hostname and the hostId, see How to determine the mapping between hostname and hostId in a VMware HA cluster (2037000).
- Due to the Dead state, a failover process is initiated. You see output similar to:
YYYY:MM:DDTHH:MM:SSZ [48A81B90 verbose 'Placement' opID=<OPID>] [PlacementManagerImpl::IssuePlacementStartCompleteEventLocked] Issue failover start event
YYYY:MM:DDTHH:MM:SSZ [48ECAB90 verbose 'FDM' opID=<OPID>] [FdmService] New event: EventEx=com.vmware.vc.HA.ClusterFailoverActionInitiatedEvent vm= host= tag=host-####-XXXXXXXX:X
- Another responsibility of the primary host is restarting virtual machines when a host in the HA cluster fails. The master sends the command to start the affected virtual machines on the other live hosts (host-XXXX, host-XXXX and localhost). You see output similar to:
YYYY:MM:DDTHH:MM:SSZ [48D03B90 verbose 'Execution' opID=<OPID>] [ExecutionManagerImpl::ConstructAndDispatchCommands] Place /vmfs/volumes/<datastore_uuid>/testvm001/testvm001.vmx on host-####(cmd ID host-####:#)
YYYY:MM:DDTHH:MM:SSZ [48D03B90 verbose 'Execution' opID=<OPID>] [ExecutionManagerImpl::ConstructAndDispatchCommands] Place /vmfs/volumes/<datastore_uuid>/testvm002/testvm002.vmx on __localhost__ (cmd ID host-####:#)
YYYY:MM:DDTHH:MM:SSZ [48E89B90 verbose 'Execution' opID=<OPID>] [ExecutionManagerImpl::ConstructAndDispatchCommands] Place /vmfs/volumes/<datastore_uuid>/testvm003/testvm003.vmx on host-####(cmd ID host-####:#)
YYYY:MM:DDTHH:MM:SSZ [48E89B90 verbose 'Execution' opID=<OPID>] [ExecutionManagerImpl::ConstructAndDispatchCommands] Place /vmfs/volumes/<datastore_uuid>/testvm004/testvm004.vmx on __localhost__ (cmd ID host-####:#)
YYYY:MM:DDTHH:MM:SSZ[48E89B90 verbose 'Execution' opID=<OPID>] [ExecutionManagerImpl::ConstructAndDispatchCommands] Place /vmfs/volumes/<datastore_uuid>/testvm005/testvm005.vmx on host-####(cmd ID host-####:#)
YYYY:MM:DDTHH:MM:SSZ [48E89B90 verbose 'Execution' opID=<OPID>] [ExecutionManagerImpl::ConstructAndDispatchCommands] Place /vmfs/volumes/<datastore_uuid>/testvm006/testvm006.vmx on host-####(cmd ID host-####:#)
- VmRestartedByHAEvent is logged by the master for virtual machines that the master restarts. This allows you to determine which virtual machines are restarted. You see output similar to:
YYYY:MM:DDTHH:MM:SSZ [48A81B90 verbose 'FDM' opID=host-###:3-0-SWI-99e10af9] [FdmService] New event: EventEx=com.vmware.vc.ha.VmRestartedByHAEvent vm=/vmfs/volumes/<datastore_uuid>/vm004/vm004.vmx host=host-#### tag=host-####:-1688405459:4
YYYY:MM:DDTHH:MM:SSZ [48A81B90 verbose 'FDM' opID=host-###:#-0-SWI-8c2d5fa7] [FdmService] New event: EventEx=com.vmware.vc.ha.VmRestartedByHAEvent vm=/vmfs/volumes/<datastore_uuid>/vm002/vm002.vmx host=host-#### tag=host-####:-XXXXXXXX:X
- VmRestartedByHAEvent messages received by the master for any virtual machines restarted by other hosts in the cluster are also visible. You see output similar to:
YYYY:MM:DDTHH:MM:SSZ [48D85B90 verbose 'FDM'] [EventManagerImpl] Received event from host-####(X.X.X.X)
YYYY:MM:DDTHH:MM:SSZ [48D85B90 verbose 'FDM'] [FdmService] New event: EventEx=com.vmware.vc.ha.VmRestartedByHAEvent vm=/vmfs/volumes/<datastore_uuid>/testvm006/testvm006.vmx host=host-#### tag=host-####:-511551180:3
YYYY:MM:DDTHH:MM:SSZ [48C81B90 verbose 'FDM'] [EventManagerImpl] Received event from host-#### (X.X.X.X)
YYYY:MM:DDTHH:MM:SSZ [48C81B90 verbose 'FDM'] [FdmService] New event: EventEx=com.vmware.vc.ha.VmRestartedByHAEvent vm=/vmfs/volumes/<datastore_uuid>/testvm001/testvm001.vmx host=host-#### tag=host-####:-511551180:4
YYYY:MM:DDTHH:MM:SSZ [48D85B90 verbose 'FDM'] [EventManagerImpl] Received event from host-#### (X.X.X.X)
YYYY:MM:DDTHH:MM:SSZ [48D85B90 verbose 'FDM'] [FdmService] New event: EventEx=com.vmware.vc.ha.VmRestartedByHAEvent vm=/vmfs/volumes/<datastore_uuid>/testvm005/testvm005.vmx host=host-#### tag=host-####:XXXXXXX:X
YYYY:MM:DDTHH:MM:SSZ [48E07B90 verbose 'FDM'] [EventManagerImpl] Received event from host-#### (X.X.X.X)
YYYY:MM:DDTHH:MM:SSZ [48E07B90 verbose 'FDM'] [FdmService] New event: EventEx=com.vmware.vc.ha.VmRestartedByHAEvent vm=/vmfs/volumes/<datastore_uuid>/testvm003/testvm003.vmx host=host-#### tag=host-####:XXXXXXXXX:X
- Review the fdm.log on the secondary host.
- This output is an example of the logging seen on a secondary host (host-XXXX) when it has received the instruction to restart the virtual machine (vm006), from the primary (host-XXXX):
YYYY:MM:DDTHH:MM:SSZ [FFA32400 verbose 'Execution'] [ActionScheduler::AddPendingActionInt] Added pending action opId = host-#### :#-#, cfgPath = /vmfs/volumes/<datastore_uuid>/testvm006/testvm006.vmx , type = VmFailover, priority = 32
YYYY:MM:DDTHH:MM:SSZ [FFA32400 verbose 'Execution'] [VmPlacementActionScheduler::ExecuteActions] Execute action opId = host-#### :4-0 for /vmfs/volumes/<datastore_uuid>/testvm006/testvm006.vmx
YYYY:MM:DDTHH:MM:SSZ [41A79B90 info 'Default' opID=host-#### :#-#] [VpxLRO] -- BEGIN task-internal-## -- -- CommandActionLRO --
YYYY:MM:DDTHH:MM:SSZ [41A79B90 verbose 'Execution' opID=host-####:#-#] [FailoverAction::StartAsync] Failing over vm /vmfs/volumes/<datastore_uuid>/testvm006/testvm006.vmx (isRegistered=false)
YYYY:MM:DDTHH:MM:SSZ [41A79B90 verbose 'Execution' opID=host-####:#-#] [FailoverAction::StartAsync] Registering vm
YYYY:MM:DDTHH:MM:SSZ [41A79B90 verbose 'Default' opID=host-####:#-#] [TaskInfoPublisher::IncListeners] host:[] Return after successful IncListeners (listeners = #, _connectionGen = #)
YYYY:MM:DDTHH:MM:SSZ[41A79B90 verbose 'Default' opID=host-####:#-#] [TaskInfoListener::TaskInfoListener] constructed. Connection number = 0
YYYY:MM:DDTHH:MM:SSZ [41A79B90 verbose 'Hal' opID=host-####:#-#] [FdmHalHost::RegisterVmAsync] Invoking registerVm on hostd
YYYY:MM:DDTHH:MM:SSZ[41A79B90 verbose 'Default' opID=host-####:#-#] TaskInfoChannel created for haTask-ha-folder-vm-vim.Folder.registerVm-XXXXXXX
YYYY:MM:DDTHH:MM:SSZ [41A79B90 verbose 'Default' opID=host-####:#-#] [TaskInfoPublisher::AddChannel] host:[] Channel (haTask-ha-folder-vm-vim.Folder.registerVm-XXXXXXX) added for task: haTask-ha-folder-vm-vim.Folder.registerVm-XXXXXXX
YYYY:MM:DDTHH:MM:SSZ [41A79B90 verbose 'Default' opID=host-####:#-#] [TaskInfoChannel::GetTaskInfo] task: haTask-ha-folder-vm-vim.Folder.registerVm-XXXXXX setup for async notification
YYYY:MM:DDTHH:MM:SSZ [FFF07B90 verbose 'Default' opID=host-####:#-#] [VpxLRO] Task task-internal-30 has been resumed
YYYY:MM:DDTHH:MM:SSZ[FFF07B90 verbose 'Execution' opID=host-####:#-#] [FailoverAction::RegisterCompletionCallback] Registering vm done (vmid=/vmfs/volumes/<datastore_uuid>/testvm006/testvm006.vmx , hostdVmId=XX)
YYYY:MM:DDTHH:MM:SSZ [FFF07B90 verbose 'Execution' opID=host-####:#-#] [FailoverAction::InitiateReconfigure] Not reconfiguring vm
YYYY:MM:DDTHH:MM:SSZ [FFF07B90 verbose 'Execution' opID=host-####:#-#] [FailoverAction::ReconfigureCompletionCallback] Reconfiguring vm /vmfs/volumes/<datastore_uuid>/testvm006/testvm006.vmxis done
YYYY:MM:DDTHH:MM:SSZ [FFF07B90 verbose 'Execution' opID=host-####:#-#] [FailoverAction::ReconfigureCompletionCallback] Powering on vm
YYYY:MM:DDTHH:MM:SSZ [FFF07B90 verbose 'Default' opID=host-####:#-#] [TaskInfoPublisher::IncListeners] host:[] Return after successful IncListeners (listeners = 2, _connectionGen = 0)
YYYY:MM:DDTHH:MM:SSZ [FFF07B90 verbose 'Default' opID=host-####:#-#] [TaskInfoListener::TaskInfoListener] constructed. Connection number = 0
YYYY:MM:DDTHH:MM:SSZ [FFF07B90 verbose 'Default' opID=host-####:#-#] TaskInfoChannel created for haTask-##-vim.VirtualMachine.powerOn-XXXXXXXX
YYYY:MM:DDTHH:MM:SSZ [FFF07B90 verbose 'Default' opID=host-####:#-#] [TaskInfoPublisher::AddChannel] host:[] Channel (haTask-##-vim.VirtualMachine.powerOn-XXXXXXX) added for task: haTask-##-vim.VirtualMachine.powerOn-XXXXXXX
YYYY:MM:DDTHH:MM:SSZ [FFF07B90 verbose 'Default' opID=host-####:#-#] [TaskInfoChannel::GetTaskInfo] task: haTask-##-vim.VirtualMachine.powerOn-XXXXXXX setup for async notification
YYYY:MM:DDTHH:MM:SSZ [41A38B90 verbose 'Default' opID=host-####:#-#] [VpxLRO] Task task-internal-30 has been resumed
YYYY:MM:DDTHH:MM:SSZ [41A38B90 verbose 'Execution' opID=host-####:#-#] [FailoverAction::PowerOnCompletionCallback] Power on vm /vmfs/volumes/<datastore_uuid>/testvm006/testvm006.vmx done
YYYY:MM:DDTHH:MM:SSZ [41A38B90 verbose 'Execution' opID=host-####:#-#] [ActionScheduler::RemoveAction] Action is removed: opId = host-####:#-#
YYYY:MM:DDTHH:MM:SSZ [41A38B90 info 'Default' opID=host-####:#-#] [VpxLRO] -- FINISH task-internal-30 -- -- CommandActionLRO --
YYYY:MM:DDTHH:MM:SSZ [FFE44B90 verbose 'FDM' opID=host-####:#-#-SWI-d2634fc5] [FdmService] New event: EventEx=com.vmware.vc.ha.VmRestartedByHAEvent vm=/vmfs/volumes/<datastore_uuid>/vm006/vm006.vmx host=host-#### tag=host-####:-XXXXXXXX:X
YYYY:MM:DDTHH:MM:SSZ [FFE44B90 verbose 'PropertyProvider' opID=host-XXXX:X-X-SWI-d2634fc5] RecordOp ADD: event[3], fdmService
YYYY:MM:DDTHH:MM:SSZ [FFE44B90 verbose 'PropertyProvider' opID=host-XXXX:X-X-SWI-d2634fc5] RecordOp ASSIGN: serverTime, fdmService
YYYY:MM:DDTHH:MM:SSZ [41AFBB90 verbose 'Execution' opID=host-XXXX:X-X] [ExecutionManagerImpl::SendUpdateToMaster] Sent command update to primary