vSphere High Availability (HA) issues
search cancel

vSphere High Availability (HA) issues

book

Article ID: 417885

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

When attempting to configure, enable, or re-enable HA you have an error.  

This resource provides a guide to identify the issue and the solutions to resolve HA issues.

Environment

vSphere 7.x

vSphere 8.x

vSphere 9.x

Cause

HA failures can be a result of any of the following reasons:

  • Host disconnected from vCenter
  • HA heartbeat datastore inaccessible
  • Failure of Primary host
  • Network partition
  • Host network isolation
  • Blocked port 8182
  • Configuration issue

Resolution

General Troubleshooting: Troubleshooting VMware High Availability (HA) issues in VMware vCenter Server

General Symptoms:

Symptoms Causes Resolutions
vSphere HA agent on a host enters  "Agent Unreachable" state for a minute or longer.
  • Networking Problems: Network connectivity issues such as misconfigured VLANs, firewall rules blocking HA traffic, physical network failures, or network partitioning can prevent communication between the HA agent and vCenter Server or the primary host.
  • Cluster-Wide Failures: If all hosts in the cluster are experiencing issues, the HA agent may become unreachable due to widespread failure or misconfiguration.
  • HA Reactivation While Disconnected: If vSphere HA was deactivated and reactivated on the cluster while the vCenter Server lost communication with the host agent, it can cause the agent to become unreachable.
  • ESXi Host Agent Failure: In rare cases, the ESXi host agent managing the HA process may fail, and the internal watchdog process is unable to restart it, causing loss of communication.
Troubleshooting vSphere HA Agent in Unreachable State
vSphere HA will not configure on one or more ESXi hosts in a cluster
 
Datastores used for vSphere HA heartbeat have changed or not yet been configured
  • Heartbeat datastore not available or deprecated.
  • Heartbeat datastore not configured.
Troubleshoot ESXi heartbeat datastore problems affect vSphere HA
The vSphere HA agent is unreachable

In the Summary tab of the affected ESXi host, the following error is seen:
vSphere HA reports that an agent is in the Agent Unreachable state
  • Network problem between vCenter and primary host.
  • Host agent failed and it is not restarted by Watchdog process.  
  • SSL certificate does not match PNID of vCenter.
vSphere HA reports that an agent is in the Agent Unreachable state

vCenter/ESXi Errors or Warnings:

Symptoms Causes Resolutions
  • Cannot reconfigure HA on a cluster.
  • Reconfiguring HA on a cluster fails.
  • You see the errors:

    Operation timed out
    or
    Cannot find vSphere HA master agent

    when there are no networking or other issues preventing host to host communication within the cluster
     
  • Removing the ESX host from the cluster and then adding it again to the cluster does not resolve the issue
  • You see this error in the Task Details pane in vCenter Server:

    Status: Operation timed out.

    Other HA related errors.
  • This issue occurs if the vSphere High Availability Agent service on the ESXi host is stopped.
"Operation timed out" while reconfiguring HA (FDM) on a cluster
  • Reconfiguring vSphere HA fails for several hosts in the cluster, but some elect into primary or secondary status.
  • After upgrading to vCenter Server 8.0.3, HA enabled clusters fail to configure, where only a few hosts elect properly
  • Messages in fdm.log mention "SSL Async Handshake Timeout" when contacting other hosts
  • fdm.log also contains messages similar to the following when attempting to contact the master FDM host
    • SSL Async Handshake Timeout : Read timeout after approximately 25000ms. Closing stream SSL
    • Failed to SSL handshake;
  • MTU Mismatch on Management network. FDM does support Jumbo Frames, but the MTU setting has to be consistent from end to end on every device.
"SSL Async Handshake Timeout"
  • Unable to install or update the vCenter Server vSphere High Availability (vSphere HA) agent service.
    • The vmware-fdm VIB is the package that runs this service on each ESXi host
  • Powering on virtual machines fails with error:

    The host is reporting errors in its attempts to provide vSphere HA support
  • In the VMware vCenter Server summary, the following vSphere HA service error may be observed:

    vSphere HA agent cannot be correctly installed or configure
  • The host disconnects intermittently from vCenter
  • ESXi host problem with third-party VIB (such as a compatibility issue)
  • ESXi host heartbeat datastore problems
  • ESXi host OS problems
  • vCenter Server OS problems
  • vCLS related issues
vSphere HA agent cannot be correctly installed or configured

 

 

Additional Information

How vSphere HA Works

Creating a vSphere HA Cluster

Disabling and enabling VMware vSphere High Availability (vSphere HA)