According to the article: Offer guidance on troubleshooting and resolving issues with vSphere HA configuration on ESXi hosts.
search cancel

According to the article: Offer guidance on troubleshooting and resolving issues with vSphere HA configuration on ESXi hosts.

book

Article ID: 332578

calendar_today

Updated On:

Products

VMware vSphere ESXi

Issue/Introduction

Symptoms:


vSphere HA Host Status error in the host summary

  • Reconfiguring for HA does not resolve the problem.
  • You notice a "configuration error" on the availability status in the host summary tab; when expanded, the following error appears:
"An error occurred when vCenter Server attempted to initialize the vSphere HA Agent running on the host. This condition is often cleared by reconfiguring vSphere HA for the host. The VMs on the host are not monitored by vSphere HA and thus will not be restarted after a failure."

 

You have attempted various troubleshooting methods, such as:

  • Disconnected and then reconnected the hosts to the vCenter.
  • Disabled and then re-enabled HA.
  • Attempted to put the cluster in retreat mode, then exited retreat mode, resulting in the recreation of the VCLs.
  • Removed and then reinstalled the FDM agents on both hosts.
  • You open the file /etc/opt/vmware/fdm/fdm.cfg on the host and see two lines like this ("30 is an example value; it may vary depending on the situation):
<unknownStateMonitorPeriod>30</unknownStateMonitorPeriod>
<unknownStateMonitorPeriod >30</unknownStateMonitorPeriod >

 

Environment

7.0.X

Cause

This happens when you previously tried to set an advanced configuration option for vSphere HA and accidentally added an extra space at the end of the setting, which disrupts the fdm.cfg configuration file.

For example, setting the das.config.fdm.policy.unknownStateMonitorPeriod (with an extra space at the end) results in the following line being created in the fdm.cfg file:

<unknownStateMonitorPeriod>30</unknownStateMonitorPeriod>
<unknownStateMonitorPeriod >30</unknownStateMonitorPeriod >

(If you examine it closely, you'll notice that the second line has an extra space after "unknownStateMonitorPeriod," which is causing the issue.)

Resolution

To address this issue (this needs to be done on all hosts experiencing this problem):

 

  • Create an offline snapshot of the vCenter.

Note: If the vCenter Server is part of a Linked Mode replication group, remember that backups or offline snapshots must be created for each member of the Linked Mode group.

  • Create a backup of the fdm.cfg file using the commands below:
cp /etc/opt/vmware/fdm/fdm.cfg /etc/opt/vmware/fdm/fdm.cfgbk
  • Modify the fdm.cfg file:
vi /etc/opt/vmware/fdm/fdm.cfg
  • Remove the line that contains only the extra space (keep the first line intact).
<unknownStateMonitorPeriod>30</unknownStateMonitorPeriod>
<unknownStateMonitorPeriod >30</unknownStateMonitorPeriod >
  • Save the file and exit.
Press ESC
:wq!
  • Reconfigure for HA on all the hosts.

Additional Information

Impact/Risks:


vSphere HA will be ineffective on any host where this modification has been applied.