Configuring vSphere HA fails with "Initialization Error"
search cancel

Configuring vSphere HA fails with "Initialization Error"

book

Article ID: 322856

calendar_today

Updated On:

Products

VMware vCenter Server VMware vSphere ESXi

Issue/Introduction

The purpose of this article is to resolve HA initialization error.

Symptoms:

  • vCenter reports "unknown error initializing HA" for the host, but the Fault Default Manager log shows that the host successfully becomes an HA primary or secondary node.
  • After exiting maintenance mode vCenter reports "unknown error initializing HA"
  • Enabling HA times out in vCenter, but is successful on host(s).Re-configuring vsphere HA task fails on "Timed out Exception" in vCenter, but Fault Default Manager log shows that the host successfully becomes an HA primary or secondary node.
  • Newly powered on virtual machine shows as unprotected by HA in vCenter,but the Fault Default Manager log shows that the virtual machine is protected.
  • vCenter and ESXi hosts are not in the same subnet, there is a gateway/ firewall between vCenter and ESXi hosts.
  • Disabling and re-enabling HA on the cluster will resolve the issue temporarily, but shows error "HA Initialization Error - Waiting to complete cluster election"
  • Reconfiguring HA at the host level fails with error "Operation timed-out - state Uninitialized - Initialization Error - vShere HA agent cannot be correctly installed or configured"
  • In vCenter vpxd.log, you will see similar entries as below

2015-01-15T06:13:59.366+03:00 [06212 info 'commonvpxLro' opID=########-######28-84-5] [VpxLRO] -- BEGIN task-internal-27741 --  -- DasConfig.ConfigureHost --
2015-01-15T06:13:59.366+03:00 [06212 info 'vpxdvpxdMoHost' opID=########-######28-84-5] [HostMo::UpdateDasState] VC state for host host-386 (HA disabled -> uninitialized), FDM state (UNKNOWN_FDM_HSTATE -> UNKNOWN_FDM_HSTATE), src of state (null -> null)
2015-01-15T06:18:08.497+03:00 [06212 error 'DAS' opID=########-######28-84-5] [VpxdDasConfigLRO::Config] Timed out waiting for election to complete or for host to join existing master
2015-01-15T06:18:08.514+03:00 [06212 error 'DAS' opID=########-######28-84-5] [VpxdDasConfigLRO::Config] EnableDAS failed on host [vim.HostSystem:host-386,<hostname>]: class Vim::Fault::Timedout::Exception(vim.fault.Timedout)

  • in ESXi host, fdm.log we see similar entries as below

         fdm.log: [FFC8AB70 info 'Election' opID=SWI-6058ed8] Slave timed out
    fdm.log: [FFB7AB70 verbose 'Cluster' opID=SWI-56f32f43] Marking slave host-349      as unreachable


Note:The preceding log excerpts are only examples.Date,time and environmental variables may vary depending on your environment

Environment

VMware vCenter Server 6.7.x
VMware vSphere ESXi 6.0
VMware vCenter Server Appliance 6.7.x
VMware vSphere ESXi 6.5
VMware vSphere ESXi 6.7
VMware vCenter Server 5.5.x
VMware vCenter Server Appliance 5.5.x
VMware vCenter Server 6.0.x
VMware vCenter Server Appliance 6.5.x
VMware vSphere ESXi 5.5
VMware vCenter Server 6.5.x
VMware vCenter Server Appliance 6.0.x
VMware vCenter Server 5.1.x
VMware vCenter Server Appliance 5.1.x

Cause

This issue occurs when there is a firewall in the environment that is dropping the HA traffic between the ESXi hosts and vCenter Server.

Resolution

To resolve this issue:
  1. Change turn off firewall or increase the idle timeout on the firewall .Or you can change the value in "config.vpxd.das.fdmWaitForUpdatesTimeoutSec" settings to smaller value until the issue never shows. E.g. 30. Reducing this value increases the frequency of traffic between VC and host hence reducing the possibility of being dropped by firewall
    Note:This option is available from 5.5 u3.In earlier versions only option is to  increase the timeout setting on the firewall ( or turn off firewall ) to identify this issue is caused by network
    1. In the vSphere Web Client, navigate to the vCenter Server instance.
    2. Select the Manage tab.
    3. Select Advanced settings.
    4. Click Edit.
    5. In the Key, type a key.
    6. In the Value field,type the value for the specified key.
    7. Click Add.
    8. Click OK.
       
  2. Change firewall settings,enlarge relevant "connection timeout" settings, consult device vendor for specific steps.
    For example
    1. Timeout:10800
    2. TCP Timeout:10800
    3. TCP Half Closed360
    4. TCP Time Wait 120
       
  3. Restart HA on the cluster.
    1. Browse to the cluster in the vSphere Web Client object navigator.
    2. Click the Manage tab and click Settings.
    3. Under Services, click Edit.
    4. Uncheck the Turn ON vSphere HA.
    5. Click OK.
    6. Click Settings again and select Turn ON vSphere HA.
    7. Click OK.
       
  4. Reconfigure HA on host level.
    1. In the vSphere Web Client select the ESXi host.
    2. Right click the ESXi host.
    3. Select All vCenter Actions > Reconfigure for vSphere HA.
NOTE: In vSphere 6.5 and above when using HTML5 based vSphere Client, the steps are similar.

       5.Move vCenter and ESXi hosts to the same subnet.