ESXi hosts intermittently show as "Not Responding" in vCenter Server due to vpxa heartbeat binding errors
search cancel

ESXi hosts intermittently show as "Not Responding" in vCenter Server due to vpxa heartbeat binding errors

book

Article ID: 399108

calendar_today

Updated On:

Products

VMware vSAN VMware vSphere ESXi 8.0 VMware vSphere ESX 8.x VMware vSphere ESX 7.x

Issue/Introduction

  • ESXi hosts goes to a "Not responding" state periodically in vCenter Server.

  • Checking further from the host Logs in the path /var/run/log/vpxa.log , it contains the following error :-
    YYYY-MM-DDTHH:MM:SS.Z Wa(164) Vpxa[2125970]: [Originator@6876 sub=Heartbeat opID=vpxaHeartbeat.cpp:####-####] Failed to bind heartbeat socket; '#.#.#.host_IP', e: 99(Cannot assign requested address)
    YYYY-MM-DDTHH:MM:SS.Z Wa(164) Vpxa[2125970]: [Originator@6876 sub=Heartbeat opID=vpxaHeartbeat.cpp:####-#### Failed to bind heartbeat socket; '#.#.#.host_IP', e: 99(Cannot assign requested address)

  • Checking further from the vCenter Logs in the path /var/log/vmware/vpxd.log , it contains the following error :-
    YYYY-MM-DDTHH:MM:SS.Z info vpxd[48005] [Originator@6876 sub=HostCnx opID=CheckforMissingHeartbeats-####] [VpxdHostCnx] No heartbeats received from host; cnx: ########-####-####-####-############, h: host-####, time since last heartbeat: 665188ms

Environment

  • VMware vSphere ESXi 7.x
  • VMware vSphere ESXi 8.x

Cause

The primary cause is a Managed IP Mismatch. The vCenter Server Database (VCDB) holds an incorrect IP address for the ESXi host in the vpx_host table. When vCenter pushes configuration to the host, it instructs the vpxa service to bind to the IP stored in the database. If that IP does not exist on the host's VMkernel interface (e.g., after a host IP change or migration), the bind fails with Error 99.

Steps to check VCDB information for host:

  1. Take a SSH to the vCenter server.
  2. Connect to the VCDB (vCenter server database) using the below command. 

    /opt/vmware/vpostgres/current/bin/psql -d VCDB -U postgres

 VCDB=# select id, dns_name, ip_address from vpx_host;

  id  |        dns_name         |   ip_address
------+-------------------------+----------------
 ####| host_name1 | #.#.#.1
 ####| host_name2 | #.#.#.2
 ####| host_name3 | #.#.#.5  ----> incorrect entry in vCenter Database for host "host_name3", it should be #.#.#.3
(3 rows)

Resolution

Take a snapshot of the vCenter Server Appliance (VCSA) as a best practice. If multiple vCenters are present in enhanced linked mode, please power off all linked vCenters at the same time and take a powered off snapshot of every linked vCenter in the SSO domain.

  1. Stop the vCenter vpxd service:

    service-control --stop vmware-vpxd
  2.  Correct the IP in VCDB by accessing the PostgreSQL database:

    /opt/vmware/vpostgres/current/bin/psql -d VCDB -U postgres

  3.  Identify the Host ID and current stale entry:

    VCDB=# select id, dns_name, ip_address from vpx_host WHERE dns_name = 'host_name3';

  4. Update the record with the correct IP. Replace #.#.#.5 with the actual management IP and ## with the ID found in the previous step.

    UPDATE vpx_host SET ip_address = '#.#.#.3' WHERE id = ##;

  5. Verify the change with the below command:

    select id, dns_name, ip_address from vpx_host WHERE id = ##;

  6. Type \q to exit the database.
  7. Start vpxd service with the below command:

    service-control --start vmware-vpxd