After replacing the default certificate on ESXi hosts, vSphere HA reports "Agent Unreachable state" or is stuck in Election State
search cancel

After replacing the default certificate on ESXi hosts, vSphere HA reports "Agent Unreachable state" or is stuck in Election State

book

Article ID: 402263

calendar_today

Updated On:

Products

VMware vCenter Server VMware vSphere ESXi

Issue/Introduction

  • The issue is seen only after default certificate replacement 
  • One host in the cluster gets configured for HA but other nodes report  "HA Agent Unreachable" or is stuck in Election state
  • Primary node ESXi - /var/run/log/fdm.log (the host that got HA configured without issues)
    YYYY-MM-DDTHH:MM:SS.MSZ Db(167) Fdm[9705572]: [Originator@6876 sub=Cluster opID=WorkQueue-59d0aa06] (VMFS) host-#### @ <REDACTED_MAC_ADDRESSES> is ALIVE
    YYYY-MM-DDTHH:MM:SS.MSZ In(166) Fdm[9705577]: [Originator@6876 sub=Cluster opID=WorkQueue-60c9d31e] Trusted host not found. Failing to verify the host; host: (<REDACTED_IPS>:49516)
    YYYY-MM-DDTHH:MM:SS.MSZ Db(167) Fdm[9705577]: [Originator@6876 sub=Cluster opID=WorkQueue-60c9d31e] Blacklisting ip address <REDACTED_IPS> for 60 seconds
    YYYY-MM-DDTHH:MM:SS.MSZ Db(167) Fdm[9705577]: [Originator@6876 sub=Cluster opID=WorkQueue-60c9d31e] IP <REDACTED_IPS> marked bad for reason Invalid Credentials
    YYYY-MM-DDTHH:MM:SS.MSZ Wa(164) Fdm[9705577]: [Originator@6876 sub=Cluster opID=WorkQueue-60c9d31e] Failed to verify host  (<REDACTED_IPS>) - closing connection
    YYYY-MM-DDTHH:MM:SS.MSZ Db(167) Fdm[9705577]: [Originator@6876 sub=Message opID=WorkQueue-60c9d31e] Accept completion callback error N5Vmomi5Fault13SecurityError9ExceptionE(Fault cause:

Environment

  • VMware vCenter 8.x
  • VMware ESXi 8.x

Cause

vSphere HA uses the host certificate to trust when allowing a host to join a HA cluster.
Any issues in validating the certificate can result in the host being marked as not trusted resulting the HA configuration task to fail. 

Resolution

Additional Information