Error: "Cannot find vSphere HA master agent" within vCenter UI
search cancel

Error: "Cannot find vSphere HA master agent" within vCenter UI

book

Article ID: 313044

calendar_today

Updated On:

Products

VMware vCenter Server

Issue/Introduction

vCenter UI presents the following error message:

  • Cannot find vSphere HA master agent
  • vSphere HA agent for this host has an error: The vSphere HA agent is not reachable from vCenter Server
 

Environment

VMware vCenter Server

Cause

  • vSphere High Availability is activated at the cluster level. It uses an agent program running on each of the protected ESXi hosts called the Fault Domain Manager (FDM) agent.
  • If this is a newly protected cluster, or the other agents detect there is no longer a managing agent available, they communicate to elect another agent to be the manager.
  • The managing agent monitors the state of the other protected hosts in the cluster for failure. It also communicates to the VCSA as to the state of the objects protected by vSphere HA.
  • When the vCenter and the managing ESXi host in a protected cluster are not able to reach an ESXi host in the cluster through an FDM agent we see the message -
    • "Cannot find vSphere HA master agent".
  • There are many possible reasons why this can happen, from network problems with the affected host to FDM agent VIB problems.

Resolution

  1. Check if the vCenter Server was just installed or updated. If so, update the FDM agent on the affected ESXi hosts.

    To reinstall the vSphere HA agent VIB, reconfigure HA on a cluster level.
    1. Browse to the cluster in the vSphere Client.
    2. Click the Manage tab and click Settings.
    3. Under Services, click Edit.
    4. Uncheck the Turn ON vSphere HA option.
    5. Click OK.
    6. Click Settings and select Turn ON vSphere HA.
    7. Click OK.
    8. If there are any issues with any host not taking the new VIB, restart management services on that host. For more information, see Restarting the Management agents in ESXi.
    9. If issues are still present after management agents restart, disconnect and reconnect the host from the vCenter Server.

  2. Check if ESXi host was just reinstalled or updated. If so, check if the issue is After update to ESXi 7.0 Update 3 vSphere HA fails to enable. Otherwise, ensure that the FDM agent on the affected host(s) matches the vCenter version.

    To remove the vSphere HA agent VIB:
    1. Put the host into maintenance mode
    2. Follow How to run vSphere HA agent remove script in ESXi to remove the vSphere HA agent from the ESXi host.
    3. If the vSphere HA remove script fails, run this command to directly remove the FDM agent VIB from the ESXi host:

      esxcli software vib remove -n vmware-fdm

    4. Reinstall the VIB
      1. Using WinSCP on vCenter, access the path below copy the VIB, and transfer it onto the local system machine.

        /etc/vmware-vpx/docRoot/vSphere-HA-depot/vib20/vmware-fdm/VMware_bootbank_vmware-fdm

      2. Once that is done, take an affected ESXi Host transfer the FDM vib to the /tmp/ folder and run the installation command

        esxcli software vib install -f -v /tmp/VMware_bootbank_vmware-fdm

    5. Exit maintenance mode.

  3. Check time settings:
    1. NTP settings and current time for vCenter
    2. NTP settings and current time for an affected ESXi host
    3. NTP settings and current time for a known good ESXi host
    4. If any discrepancy in times, fix the settings so the devices all match and verify they are synchronized.

      For vCenter Server, follow - Configuring vCenter Server to use a Network Time Protocol (NTP) server.
      For ESXi hosts, follow - Use NTP Servers for Time and Date Synchronization of a Host

  4. Ensure network communication
    1. Management network connectivity between the vCenter and the affected ESXi host
    2. Management network connectivity between a good ESXi host and the affected ESXi host.
    3. If connectivity fails, troubleshoot the network issue, ensuring required ports are open and affected host settings match good host settings including distributed virtual switch (DVS) ports used.

      vCenter to the affected host, substituting the host management IP:

      curl -v telnet://<ESXi host IP>:443
      curl -v telnet://<ESXi host IP>:902

      Good Host in cluster to affected host, substituting the vmk and host management IP:

      vmkping -I <management vmk, usually vmk0> <affected host IP>

    4. If the affected host is not set up on a virtual distributed switch but the working hosts are, add the affected host to DVS to match them per Add Hosts to a vSphere Distributed Switch

  5. Ensure storage communication
    1. If the cluster is a vSAN cluster:
      1. Ensure there are at least 3 ESXi hosts in the cluster.
      2. Ensure the host has vSAN vmkernel networking set up per How to configure vSAN VMkernel networking.
    2. For non vSAN cluster - Check that the host has reliable access to the heartbeat datastores

  6. Check whether the agent on the host has failed and the watchdog process is unable to restart it
    1. Follow Error "Operation timed out" while reconfiguring HA (FDM) on a cluster

  7. Check whether all hosts in the cluster have a failed vSphere HA status
    1. If so, this is likely due to problems with the FDM VIB due to a common VIB configuration or other issue.
      1. Follow Error: "vSphere HA agent cannot be correctly installed or configured"

  8. If problems persist, follow other resolution steps available in Error: "vSphere HA agent cannot be correctly installed or configured"

Additional Information

Impact/Risks:

  • Virtual machines on an ESXi host that is not properly running the agent or able to connect to other hosts for vSphere High availability will not be able to fail over VMs in case of host failure.
  • Some add-on software such as vSphere VMware Cloud Foundation might not install or update if there are hosts affected by the error in the vCenter being used for installation.