Troubleshooting VMware High Availability (HA) in VMware vSphere 4.x
search cancel

Troubleshooting VMware High Availability (HA) in VMware vSphere 4.x

book

Article ID: 342222

calendar_today

Updated On:

Products

VMware vCenter Server VMware vSphere ESXi

Issue/Introduction

  • VMware High Availability (HA) failover errors:

    HA agent on server in cluster cluster in datacenter has an error

    Insufficient resources to satisfy HA failover level on cluster


  • HA agent configuration errors on ESX hosts:

    • Failed to connect to host
    • Failed to install the VirtualCenter agent
    • cmd addnode failed for primary node: Internal AAM Error - agent could not start
    • cmd addnode failed for primary node:/opt/vmware/aam/bin/ft_startup failed

  • Configuration of hosts IP address is inconsistent on host hostname address resolved to IP and IP
  • Port errors:

    Ports not freed after stop_ftbb

  • The first node in the HA cluster enables correctly but the second node fails to configure HA just after 90%
  • The network settings and HA configuration are all correct. DNS and ping tests are all successful
  • Disabling and re-enabling HA on the cluster does not resolve the issue
  • VMware Infrastructure (VI) Client displays the error:

    Internal AAM Errors - agent could not start

  • In the aam_config_util_addnode.log file on the ESX, you see entries similar to:

    [myexit ] Failure location:
    [myexit ] function main::myexit called from line 2199
    [myexit ] function main::start_agent called from line 1168
    [myexit ] function main::add_aam_node called from line 171
    [myexit ] VMwareresult=failure


  • Adding a host to the cluster fails with the error:

    Cannot complete the configuration of the HA agent on the host. Other HA configuration error.


Environment

VMware ESXi 4.0.x Installable
VMware ESX Server 3.0.x
VMware vCenter Server 4.1.x
VMware ESX Server 3.5.x
VMware ESXi 4.1.x Embedded
VMware ESX 4.0.x
VMware ESXi 4.0.x Embedded
VMware ESX 4.1.x
VMware ESXi 4.1.x Installable
VMware VirtualCenter 2.0.x
VMware VirtualCenter 2.5.x
VMware vCenter Server 4.0.x

Resolution

This article guides you through the process of troubleshooting a VMware HA cluster. The article identifies common configuration problems and also confirms the availability of required resources on your ESXi host.

Note: For troubleshooting steps specific to VMware vCenter Server 5.x HA/FDM, see Troubleshooting VMware High Availability (HA) issues in VMware vCenter Server 5.x and 6.0 (2004429).

Validate that each troubleshooting step below is true for your environment. Each step provides instructions or a link to a document, to eliminate possible causes and take corrective action as necessary. The steps are ordered in the most appropriate sequence to isolate the issue and identify the proper resolution. Do not skip a step.

Most issues can also be solved by disabling HA and then re-enabling it. Ensure to do this before proceeding with these steps.

Note: If you perform a corrective action in any of these steps, attempt to re-enable VMware HA again.
  1. Check the release notes for current releases to see if the problem has been resolved in a bug fix. For more information, see VMware vSphere4 Documentation or VMware Infrastructure 3 Documentation.

  2. Verify that there are enough licenses to configure VMware HA. For more information, see Verifying that a feature is licensed (1003692).

  3. Verify that name resolution is correctly configured on the ESX Server. For more information, see Identifying issues with and setting up name resolution on ESXi/ESX Server (1003735).

  4. Verify that name resolution is correctly configured on the vCenter Server. For more information, see Configuring name resolution for VMware vCenter Server (1003713).

  5. Verify that the time is correct on all ESX Servers with the date command. For more information on setting up time synchronization with ESX Server, see Installing and Configuring NTP on an ESX host (1339).

  6. Verify that network connectivity exists from the VirtualCenter Server to the ESX Server. For more information, see Testing network connectivity with the ping command (1003486).

  7. Verify that network connectivity exists from the ESX Server to the isolation response address. For more information, see Testing network connectivity with the ping command (1003486).

  8. Verify that all of the required network ports are open. For more information, see Testing port connectivity with Telnet (1003487).

    Notes:
    • HA uses these ports:

      Incoming port: TCP/UDP 8042-8045
      Outgoing port: TCP/UDP 2050-2250

    • Ensure that AAM (Automated Availability Manager) is enabled on the ESX Security Profile. If these ports are not open on the ESX firewall, HA cannot configure.

  9. If configured with Advanced Settings, confirm that the configuration is valid. For more information, see Advanced Configuration options for VMware High Availability (1006421).

  10. Verify that the correct version of the VirtualCenter agent service is installed. For more information on determining agent versions and how to manually uninstall and reinstall the HA agents on an ESX host, see Verifying and reinstalling the correct version of the vCenter Server agents (1003714).

  11. Verify the VirtualCenter Server Service has been restarted. To restart the VirtualCenter Server Service, see Stopping, starting, or restarting vCenter services (1003895).

  12. Verify that VMware HA is only attempting to configure on one Service Console. For more information, see VMware High Availability configuration issues when an iSCSI Service Console is on the same network (1003789).

  13. Verify that the VMware HA cluster is not corrupted. To do this you need to create another cluster as a test. For more information, see Recreating a VMware High Availability Cluster in vSphere (1003715).

  14. Verify that UDP 8043 packets used for the HA backbone communications are not dropped among the ESX hosts. For more information, see HA fails to configure at 90% completion with the error: Internal AAM Error - agent could not start (1018217).

  15. Ensure that the ESXi host userworld swap option is enabled. For more information, see ESXi hosts without swap enabled cannot be added to a VMware High Availability Cluster (1004177).

Note: If your problem still exists after trying the steps in this article:

Additional Information

Verify the contents of the /etc/hosts files on the ESXi/ESX host and ensure that the IP address of the host in the DNS is the same as the IP address in the hosts configuration file. Also ensure that the IP, FQDN, and shortname are added to the ESXi/ESX hosts file. For more information, see:
For translated versions of this article, see: