ESXi host disconnects from vCenter Server after adding or connecting it to the inventory
search cancel

ESXi host disconnects from vCenter Server after adding or connecting it to the inventory

book

Article ID: 323612

calendar_today

Updated On:

Products

VMware vSphere ESXi VMware vCenter Server

Issue/Introduction

This article helps identify issues with heartbeat traffic between vCenter Server and ESXi causing the host to disconnect and enter a "not responding" state.

The steps provided in this article and the included Knowledge Base article links help determine if heartbeat packets are being dropped or lost.

Symptoms:
  • An ESXi host disconnects from vCenter Server.
  • After adding or reconnecting an ESXi to the vCenter Server inventory, it disconnects 30 to 90 seconds after the task completes.
  • Changing the uplink switch port VLAN information for the new IP of the ESXi host before changing the IP of the ESXi host results in the host showing as disconnected in vCenter Server.
  • Changing the IP address of the ESXi host using the DCUI without first removing the host from the vCenter Server inventory results in the host showing up as disconnected in vCenter Server.

Cause

This issue occurs when heartbeat packets are dropped, blocked, or otherwise lost between the vCenter Server and the ESXi host.

It is important to remember that the default heartbeat port is UDP 902, and these packets must be received regularly by the vCenter Server from each ESXi host in its inventory in order for the ESXi host(s) to stay connected and remain in the vCenter Server inventory.
 
Once the default value of 6 consecutive heartbeat packets are missed by the vCenter Server for a specific host, that host is disconnected from the vCenter inventory and the vCenter Server is unable to directly manage the host until connectivity is restored.
 
Since the heartbeat packets are sent by default at a rate of one packet every 10 seconds, 6 missed heartbeats equate to 60 seconds, hence why if the host is getting disconnected every 60 seconds it is reasonable to think the heartbeats are not being received.
 
The most common reason for missed heartbeats is a firewall blocking the UDP 902 packets from being delivered.

Resolution

To troubleshoot this issue, ensure that heartbeat communications from the host to vCenter are functioning correctly and are being received by the vCenter Server.

The default port for this communication is UDP 902, but be sure to verify the configured port in the vpxa.cfg file on the host. This file also defines the IP address, which manages the host.
 

Confirm vCenter Server managed IP address

Confirm the vCenter Server managed IP address continuity throughout the environment.
 

  1. Determine the managed IP address of the vCenter Server:
     
    1. Connect to vCenter Server with the vSphere Client.
    2. Click Administration > vCenter Server Settings > Advanced Settings.
    3. Make a note of the IP address in the ManagedIP row.
       
  2. Determine the IP address configured for vCenter Server:

    For vCenter Server installed on a Windows Server:
     
    1. From a console or RDP session to the vCenter Server desktop, open a command prompt.
    2. Run the command:

      ipconfig
       
    3. Make a note of the IP address and ensure that it matches the managed IP address found in step 1.

    For vCenter Server Appliance:
     
    1. From a console or SSH session to the vCenter Server Appliance, open a shell prompt. For more information, see Opening a command or shell prompt (1003892).

      Note: From the console of the vCenter Server Appliance, press enter on Login.
       
    2. Run the command:

      ifconfig
       
    3. Make a note of the IP address next to inet addr: and ensure that it matches the managed IP address found in step 1.
       
  3. Determine the IP address and port that the ESXi host is using for heartbeat traffic:
     
    1. Connect to the same host using SSH.
    2. Check the vpxa.cfg file for the heartbeat traffic port by running the command:
       
      • On ESXi 6.x:

        grep -i serverport /etc/vmware/vpxa/vpxa.cfg

      • On ESXi 7.0U3+:

        configstorecli config current get -c esx -g services -k vpxa_solution_user_config |grep -i server_port

    3. Ensure that the port number matches the default heartbeat port of 902.
    4. Check the vpxa.cfg file for the managed IP address by running the command:
       
      • On ESXi 6.x:

        grep -i serverIp /etc/vmware/vpxa/vpxa.cfg

      • On ESXi 7.0U3+:

        configstorecli config current get -c esx -g services -k vpxa_solution_user_config |grep -i server_ip

    5. Ensure that the IP address matches the managed IP address found in Step 1.

      Note: If the IP address is not the same as the one noted in Step 1, see vCenter Server IP address change causes ESX hosts to disconnect (1001493).

Connectivity

Test connectivity between vCenter Server and the ESXi host through the heartbeat network.

Because the packets are sent to a UDP port we cannot check port connectivity using netcat because the test with a UDP flag ("-u") will always succeed.

Therefore we can determine if the vCenter Server is getting the heartbeat packets by running a capture on the vCenter Server itself.

To do so, open an SSH session to the vCenter, or otherwise connect to the appliance using a remote console, type "shell" to launch the Bash Shell prompt and run the below command:
tcpdump src xxx.xxx.xxx.xxx and udp port 902 -nn

where xxx.xxx.xxx.xxx is the management IP of the host that is disconnecting.

As the packets are sent only once every 10 seconds, please make sure to let the above capture run for at least 10 seconds to determine if they are being received correctly. Once the desired information has been gathered, the capture can be killed using "Ctrl+c".

Note: Heartbeats are only sent in the direction of host to vCenter over UDP port 902; checking connectivity from host to vCenter over TCP 902 using netcat or similar command is expected to fail, as this is port is not needed for connectivity (though vCenter to host over TCP 902 is).

Congestion

Test network congestion:

Other troubleshooting areas



Additional Information