vCenter presents "Cannot find vSphere HA master agent" message
search cancel

vCenter presents "Cannot find vSphere HA master agent" message

book

Article ID: 313044

calendar_today

Updated On:

Products

VMware vCenter Server

Issue/Introduction

Symptoms:
There is an error in the vCenter Server user interface saying "Cannot find vSphere HA master agent"
The user interface of an add-ion vSphere product shows that vCenter reports it "Cannot find vSphere HA master agent"
For each affected host you select, you should see the warning "vSphere HA agent for this host has an error: The vSphere HA agent is not reachable from vCenter Server"

Cause

vSphere High Availability is activated at the cluster level. It uses an agent program running on each of the protected ESXi hosts called the Fault Domain Manager (FDM) agent. If this is a newly protected cluster, or the other agents detect there is longer a no managing agent available, they communicate to elect another agent to be the manager.

The managing agent monitors the state of the other protected hosts in the cluster for failure. It also communicates to the vCenter Server as to the state of the objects protected by vSphere HA.

When the vCenter and the managing  ESXi host in a protected cluster are not able to reach an ESXi host in the cluster through an FDM agent we see the message "Cannot find vSphere HA master agent".

There are many possible reasons why this can happen, from network problems with the affected host, to FDM agent VIB problems.

Resolution

1. Check if vCenter Server was just installed or updated
If so, update the FDM agent on the affected ESXi hosts.

To reinstall the vSphere HA agent VIB:
  1. Reconfigure HA on a cluster level.
  2. Browse to the cluster in the vSphere Web Client object navigator.
  3. Click the Manage tab and click Settings.
  4. Under Services, click Edit.
  5. Uncheck the Turn ON vSphere HA option.
  6. Click OK.
  7. Click Settings and select Turn ON vSphere HA.
  8. Click OK.
  9. If any issue with any host not taking the new VIB, restart management services on that host. For more information, see Restarting the Management agents in ESXi.
  10. If still issues after management agents restarted, disconnect and reconnect the host from the vCenter Server.
2. Check if If ESXi host was just reinstalled or updated
If so, check if issue is After update to ESXi 7.0 Update 3 vSphere HA fails to enable
Otherwise, ensure that the FDM agent on the affected host(s) matches the vCenter version

To remove the vSphere HA agent VIB:
  1. Put the host into maintenance mode
  2. Follow How to run vSphere HA agent remove script in ESXi to remove vSphere HA agent from the ESXi host.
  3. If the vSphere HA remove script fails, run this command to directly remove the FDM agent VIB from the ESXi host:
esxcli software vib remove -n vmware-fdm
  1. Exit maintenance mode.
  2. Reinstall the VIB per step 1, above

3. Check  time settings:
  1. NTP settings and current time for vCenter
  2. NTP settings and current time for an affected ESXi host
  3. NTP settings and curent time for a known good ESXi host
  4. If any discrepancy in times, fix the settings so the devices all match and verify they are synchronized.

4. Ensure network communication
  1. Management network connectivity between the vCenter and the affected ESXi host
  2. Management network connectivity between a good ESXi host and the affected ESXi host
  3. If connectivity fails, troubleshoot the network issue, ensuring required ports are open and affected host settings match good host settings including distributed virtual switch (DVS) ports used.
vCenter to affected host, substituting the host management IP:
curl -v telnet://<ESXi host IP>:443
curl -v telnet://
<ESXi host IP>:902

Good Host in cluster to affected host, substituting the vmk and host management IP:
vmkping -I <management vmk, usually vmk0> <affected host IP>

if affected host is not set up on a virtual distributed switch but the working hosts are, add the affected host to DVS to match them per Add Hosts to a vSphere Distributed Switch

5. Ensure storage communication
 
If the cluster is a vSAN cluster:
  1. Ensure there are at least 3 ESXi hosts in the cluster.
  2. Ensure the host has vSAN vmkernel networking set up per How to configure vSAN VMkernel networking.
For any sort of cluster:
Check that the host has reliable access to the heartbeat datastores
 
6. Check whether the agent on the host has failed and the watchdog process is unable to restart it

7. Check whether all hosts in the cluster have a failed vSphere HA status

If so, this is likely due to problems with the FDM VIB due to a common VIB configuration or other issue.
Follow Error: "vSphere HA agent cannot be correctly installed or configured"
 
8. If problems persist, follow other resolution steps available in Error: "vSphere HA agent cannot be correctly installed or configured"


Additional Information

For more information on the function of vSphere HA in event of a host failure, see: Determining if your VMware vSphere HA cluster has experienced a host failure
For more information about vSphere Availability, see https://docs.vmware.com/en/VMware-vSphere/7.0/vsphere-esxi-vcenter-server-70-availability-guide.pdf

Impact/Risks:
Virtual machines on an ESXi host which is not properly running the agent or able to connect to other hosts for vSphere High availability will not be able to fail over VMs in case of host failure.

Some add-on software such as vSphere VMware Cloud Foundation might not install or update if there are hosts affected by the error in the vCenter being used for installation.