After an upgrade or network change, vSphere High Availability (HA) fails to initialize or recover on one or more ESXi hosts. The following errors are observed in the vSphere Client:
vSphere HA agent for this host has an error: The vSphere HA agent is not reachable from vCenter Server
Cannot find vSphere HA master agent
vCenter Server is unable to find a master vSphere HA agent in cluster
On ESXi host /var/run/log/lifecycle.log
DEBUG Downloading depot index.xml from http://[vCenter-FQDN]:9084/vum/repository/...WARNING Download failed: <urlopen error timed out>, 9 retry left...
vCenter Server
vSphere ESXi
The ESXi host cannot establish a connection to the vCenter Server because of an incorrect static IP address entry for the vCenter FQDN in the host's /etc/hosts file. This prevents the host from correctly resolving the vCenter Server's current IP address, causing communication timeouts on management ports (such as 9084).
To resolve this issue, remove the incorrect static entries from the ESXi host and reconfigure HA.
1. Identify Incorrect Entries in /etc/hosts on ESXi host
Log in to the affected ESXi host(s) via SSH and check the contents of the hosts file using the command
cat /etc/hosts
Look for entries matching your vCenter FQDN or short name that point to an incorrect or decommissioned IP address.
2. Verify Connectivity
ping vCenter-FQDN
nslookup vCenter-FQDN
nc -z vCenter-FQDN 9084
Once the issue is identified and resolved, reconfigure vSphere HA on impacted cluster