ESXi host disconnects intermittently from vCenter Server
search cancel

ESXi host disconnects intermittently from vCenter Server

book

Article ID: 318647

calendar_today

Updated On:

Products

VMware vCenter Server VMware vSphere ESXi

Issue/Introduction

  • ESXi hosts disconnect frequently from vCenter Server
  • vCenter Server shows ESXi host(s) as not responding
  • vCenter Server is not randomly receiving ESXi heartbeats
  • vCenter - /var/log/vmware/vpxd/vpxd.log contains entries similar to:

    [<YYYY-MM-DD>T<time> verbose 'App'] [VpxdIntHost] Missed 2 heartbeats for host esx.example.com

Environment

VMware vCenter Server 8.x

VMware vCenter Server 7.x

VMware vSphere ESXi 8.x

VMware vSphere ESXi 7.x

Cause

This issue occurs when the UDP heartbeat message sent by an ESXi host is not received by vCenter Server. If vCenter Server does not receive the UDP heartbeat message, it treats the host as not responding. An ESXi host sends heartbeats every 10 seconds and vCenter Server has a window of 60 seconds to receive the heartbeats. This behavior can be an indication of a congested network between the ESXi host and vCenter Server.
 
Note: If the host disconnects every 60 seconds there is likely a firewall blocking UDP 902 heartbeats from ESXi host to vCenter.

Resolution

Confirming Packet Continuity 

  1. To confirm the ESXi host is sending heartbeat packets to the vCenter every 10 seconds, use the following command from an SSH session to the ESXi host.

    esxi# tcpdump-uw dst host <vcenter_ip_address> and udp port 902

  2. To confirm if heartbeats are reaching vCenter over port UDP 902 every 10 seconds, use the following command from an SSH session to the vCenter Appliance.
esxi# tcpdump src host <esxi_host_ip_address> and udp port 902


Next steps
  • If heartbeats are being sent by the ESXi host, but not reaching the vCenter, the network between the two machines needs to be investigated further for firewalls or other connection limiting mechanisms.
  • If heartbeats are not being sent by the host, investigate the ESXi services and log files for a possible cause.
  • If heartbeats are both being sent by ESXi and being received by vCenter, the problem is not related to a network block of heartbeat traffic. Investigate the vCenter Server for reasons why the hosts are intermittently disconnecting. 
Note: We can also use the below methods to check the UDP port 902 connectivity between vCenter and ESXi host using the below commands:
 
On vCenter:
 
vcsa# tcpdump -ni eth0 host <esxi_host_ip_address> and udp port 902
 
The expected output should be a heartbeat packet from the ESXi host on port 902 received on VC every 10 seconds.
 
On ESXi host:
 
esxi# pktcap-uw --vmk vmk0 --dstudpport 902 --dir 0 -o - | tcpdump-uw -enr - 
 
We can also check TCP 902 connectivity on the vCenter to ensure TCP port 902 connectivity is working fine. This is only for TCP connectivity check validation only. 
 
vcsa# curl -v telnet://<esxi_host_ip_address>:902
 
Important: Please ensure the network between the vCenter and ESXi host UDP port 902 connectivity is set to bi-directional on the firewall. 
 
Workaround
As a temporary workaround, increase the timeout limit in vCenter Server by editing or creating the Advanced Setting: config.vpxd.heartbeat.notRespondingTimeout

Note: Increasing the timeout is a short-term solution until any network issues can be resolved.
 
vSphere Client:
 
To increase the timeout limit to 120 seconds (vary as needed):
  1. Open the vSphere Client in a web browser and log in.
  2. Select the vCenter object from the inventory under Hosts and Clusters.
  3. Select the Manage or Configure tab.
  4. Select SettingsAdvanced Settings.
  5. Click Edit.
  6. In the Key field, type:

    config.vpxd.heartbeat.notRespondingTimeout
     
  7. In the Value field, type:

    120
     
  8. Click Add.
  9. Click OK.
  10. Restart the vCenter Server service.

    vcsa# service-control --stop vmware-vpxd && service-control --start vmware-vpxd