Verify ESXi host heartbeat to vCenter using packet capture utilities
search cancel

Verify ESXi host heartbeat to vCenter using packet capture utilities

book

Article ID: 307364

calendar_today

Updated On:

Products

VMware vSphere ESXi VMware vCenter Server

Issue/Introduction

This article will help trace the UDP heartbeats between ESXi hosts and vCenter server appliance using inbuilt pktcap and tcpdump utilities if the network connection between vCenter and ESXi is a suspect.
  • An ESXi/ESX host is successfully added to the vCenter Server Inventory but enters a Not Responding or Disconnected state after one minute.
  • Interacting with the ESX/ESXi host causes it to go into a Not Responding or Disconnected state.
  • Hosts reconnect shortly after disconnecting.
  • You can use the vSphere Client to successfully connect to the ESXi/ESX host directly.
  • In the vpxd.log file, you see entries similar to:

    2012-04-02T13:07:49.579+02:00 [02068 info 'Default'] [VpxLRO] -- BEGIN task-internal-253 -- host-94 -- VpxdInvtHostSyncHostLRO.Synchronize --
    2012-04-02T13:07:49.579+02:00 [02068 warning 'Default'] [VpxdInvtHostSyncHostLRO] Connection not alive for host host-94
    2012-04-02T13:07:49.579+02:00 [02068 warning 'Default'] [VpxdInvtHost::FixNotRespondingHost] Returning false since host is already fixed!
    2012-04-02T13:07:49.579+02:00 [02068 warning 'Default'] [VpxdInvtHostSyncHostLRO] Failed to fix not responding host host-94
    2012-04-02T13:07:49.579+02:00 [02068 warning 'Default'] [VpxdInvtHostSyncHostLRO] Connection not alive for host host-94
    2012-04-02T13:07:49.579+02:00 [02068 error 'Default'] [VpxdInvtHostSyncHostLRO] FixNotRespondingHost failed for host host-94, marking host as notResponding
    2012-04-02T13:07:49.579+02:00 [02068 warning 'Default'] [VpxdMoHost] host connection state changed to [NO_RESPONSE] for host-94
    2012-04-02T13:07:49.610+02:00 [02248 info 'Default' opID=66183d64] [VpxLRO] -- FINISH task-internal-252 -- -- vim.SessionManager.acquireSessionTicket -- 52fa8682-####-####-####-6192cb2c22f9(5298e245-####-####-####-dedfbe369255)
    2012-04-02T13:07:49.719+02:00 [02068 info 'Default'] [VpxdMoHost::SetComputeCompatibilityDirty] Marked host-94 as dirty.

     

  • In the vpxd.log file, you also see entries similar to:
2021-04-18T11:10:38.308Z info vpxd[7FCF54E1C700] [Originator@6876 sub=HostCnx opID=CheckforMissingHeartbeats-7ea0ff43] [VpxdHostCnx] No heartbeats received from host; cnx: 52f79073-####-####-####-610c96663963, h: host-10046, time since last heartbeat: 1546793ms
2021-04-18T11:10:38.308Z info vpxd[7FCF54E1C700] [Originator@6876 sub=HostCnx opID=CheckforMissingHeartbeats-7ea0ff43] [VpxdHostCnx] No heartbeats received from host; cnx: 5244f035-####-####-####-8130b157e58d, h: host-883, time since last heartbeat: 1546779ms
2021-04-18T11:10:38.308Z info vpxd[7FCF54E1C700] [Originator@6876 sub=HostCnx opID=CheckforMissingHeartbeats-7ea0ff43] [VpxdHostCnx] No heartbeats received from host; cnx: 52019074-####-####-####-db2b82e5bfe6, h: host-3822, time since last heartbeat: 1546757ms
2021-04-18T11:10:38.308Z info vpxd[7FCF54E1C700] [Originator@6876 sub=HostCnx opID=CheckforMissingHeartbeats-7ea0ff43] [VpxdHostCnx] No heartbeats received from host; cnx: 52e4032c-####-####-####-4274de93e136, h: host-7549, time since last heartbeat: 1546751ms

Note: The preceding log excerpts are only examples. Date, time, and environmental variables may vary depending on your environment.



Environment

VMware vSphere 6.x
VMware vSphere 7.x
VMware vSphere 8.x

Cause

By default ESXi host sends heartbeat every 10 seconds and vCenter has a 60 seconds windows to receive them. Due to congestion on physical network or firewall configuration, the heartbeat packets might be delayed or dropped. If vCenter server does not receive the UDP heartbeat, it marks the host as not responding.

If the host reconnects to the vCenter Server shortly after disconnecting, this is most likely a network interruption which interferes with heartbeat packets being sent from the host to vpxd.
If the host is not connecting to vCenter Server at all, the cause may be a permanent network interference or service instability.

Resolution

  • On ESXi, run the following command to start the capture

    pktcap-uw --uplink vmnicXX --capture UplinkSndKernel -o - | tcpdump-uw -r - -nn port 902

    Note: You will see output similar to the following:



  • On vCenter Appliance, run the following command 

    tcpdump src <host IP or Host FQDN> and port 902

    Note: You will see output similar to the following:



  • You can stop pktcap-uw tracing on the ESXi with the kill command:

    kill $(lsof |grep pktcap-uw |awk '{print $1}'| sort -u)

  • Run this command to check that all pktcap-uw traces are stopped:

    lsof |grep pktcap-uw |awk '{print $1}'| sort -u

Additional Information

For more information on troubleshooting host not responding issue see, Troubleshooting an ESXi host in a "not responding" state
For more information on using pktcap utility see, Using the pktcap-uw tool in ESXi 5.5 and later 
Troubleshooting tools for networking on vCenter server appliance 6.5, Troubleshooting Tools for Networking on vCenter Server Appliance 6.5 and Above