ESXi Hosts disconnecting/reconnecting to vCenter through a physical firewall at specific intervals
search cancel

ESXi Hosts disconnecting/reconnecting to vCenter through a physical firewall at specific intervals

book

Article ID: 320776

calendar_today

Updated On:

Products

VMware vCenter Server VMware vSphere ESXi

Issue/Introduction

Symptoms:

  • Physical firewall is in-between vCenter and ESXi hosts (the firewall can be a physical device or a virtual machine firewall)
  • Hosts consistently disconnecting and re-connecting in the inventory of vCenter. (At consistent intervals)
  • Intermittent communication and delay on GUI selections.
  • Disruption to vCenter/ESXi tasks like vMotion and configuration.
  • vCenter is pingable
  • ESXi host is pingable
  • Port checks don't fail for 902/443


Log Messages similar to 

vpxd.log


yyyy-mm-ddT warning vpxd[54138] [Originator@6876 sub=MoHost opID=HostSync-host-7373-4d9e1c45] host [vim.HostSystem:host-7373,HOST] connection state changed to NO_RESPONSE

yyyy-mm-ddT warning vpxd[54366] [Originator@6876 sub=MoHost opID=HB-host-7196@173574-726e9a66] host [vim.HostSystem:host-7196,HOST] connection state changed to NO_RESPONSE

yyyy-mm-ddT warning vpxd[53795] [Originator@6876 sub=MoHost opID=HB-host-1537@111892-622e6740] host [vim.HostSystem:host-1537,HOST] connection state changed to NO_RESPONSE

yyyy-mm-ddT warning vpxd[50210] [Originator@6876 sub=MoHost opID=HostSync-host-1426-5325211d] host [vim.HostSystem:host-1426,HOST] connection state changed to NO_RESPONSE

yyyy-mm-ddT warning vpxd[53053] [Originator@6876 sub=MoHost opID=HB-host-7100@202403-7981a21a] host [vim.HostSystem:host-7100,HOST] connection state changed to NO_RESPONSE

yyyy-mm-ddT info vpxd[53053] [Originator@6876 sub=MoHost opID=HB-host-7100@202403-1f0bf9d8] host [vim.HostSystem:host-7100,HOST] connection state changed to CONNECTED

yyyy-mm-ddT warning vpxd[21206] [Originator@6876 sub=InvtHostCnx opID=HB-host-7100@204027-3f60ef10] Exception occurred during host sync; Host communication failed; [vim.HostSystem:host-7100,HOST], e: N5Vmomi5Fault17HostCommunication9ExceptionE(Fault cause: vmodl.fault.HostCommunication

yyyy-mm-ddT warning vpxd[21206] [Originator@6876 sub=MoHost opID=HB-host-7100@204027-3f60ef10] host [vim.HostSystem:host-7100,HOST] connection state changed to NO_RESPONSE



Host Sync takes 15 minutes or a specific time


yyyy-mm-ddT warning vpxd[06410] [Originator@6876 sub=VpxProfiler opID=HB-host-10212@115485-6719331f] DoHostSync:host-10212 [GetChangesTime] took 942862 ms
yyyy-mm-ddT warning vpxd[06410] [Originator@6876 sub=VpxProfiler opID=HB-host-10212@115485-6719331f] DoHostSync:host-10212 [DoHostSyncTime] took 942862 ms

vpxd.log-yyyy-mm-ddT warning vpxd[54360] [Originator@6876 sub=VpxProfiler opID=HB-host-1609@216942-6622c233] DoHostSync:host-1609 [DoHostSyncTime] took 120001 ms
vpxd.log-yyyy-mm-ddT warning vpxd[06881] [Originator@6876 sub=VpxProfiler opID=HB-host-730@96905-2354c435] DoHostSync:host-730 [DoHostSyncTime] took 120002 ms

 

 

Environment

VMware vCenter Server 7.0.x
VMware vCenter Server 8.0.x

Cause

After vCenter services and ESXi issues are ruled out there are some firewalls that may block the TCP connection to vCenter at a specific timeout.

To work around this issue TCP keep alive was introduced to keep the communication established. 

Pre 6.7 the keep-alive timeout value was set to 30 minutes.
Post 6.7 this keep-alive timeout was set to 15 minutes.

If a firewall time's out the connection before 15 minutes vCenter will be unable to communicate with the host until the TCP keep-alive is sent and the connection is reconnected.

Resolution

This issue must be remediated on the physical firewall, please contact the firewall vendor to increase the timeout to standards which is 1 hour.

Workaround:
You can change the VC/ESXi settings to lower the keep alive to be more aggressive. Only one side must be changed. 

Applying aggressive keep-alives to VC-ESX connections. This can be done on the VC side by adding the following block to "/etc/vmware-vpx/vpxd.cfg", under the <vmacore> section:

 <tcpKeepAlive>
    <serverSocket>
       <isEnabled>true</isEnabled>
       <idleTimeSec>45</idleTimeSec>
       <probeTimeSec>10</probeTimeSec>
       <probeCount>3</probeCount>
    </serverSocket>
    <clientSocket>
       <isEnabled>true</isEnabled>
       <idleTimeSec>45</idleTimeSec>
       <probeTimeSec>10</probeTimeSec>
       <probeCount>3</probeCount>
    </clientSocket>
 </tcpKeepAlive>

Or on the ESX side by editing "/etc/vmware/rhttpproxy/config.xml" (which already has a <tcpKeepAlive> block) to contain the above values.

Restart vpxd or vpxa services after the change

Refere article Restarting the Management agents in ESXi to restart vpxa service.

# /etc/init.d/vpxa restart


Refer article Stopping, Starting or Restarting VMware vCenter Server Appliance 6.x & above services to restart vCenter vpxd service.

# service-control --stop vpxd && service-control --start vpxd



Additional Information

Impact/Risks:
Communication issues between vCenter and ESXi host