ESXi Hosts disconnecting/reconnecting to vCenter through a physical firewall at specific intervals
search cancel

ESXi Hosts disconnecting/reconnecting to vCenter through a physical firewall at specific intervals

book

Article ID: 320776

calendar_today

Updated On:

Products

VMware vCenter Server VMware vSphere ESXi

Issue/Introduction

  • Physical firewall is in-between vCenter and ESXi hosts (the firewall can be a physical device or a virtual machine firewall)
  • Intermittent communication and delay on GUI selections.
  • Disruption to vCenter/ESXi tasks like vMotion and configuration.
  • vCenter is pingable
  • ESXi host is pingable
  • Port checks don't fail for 902/443
  • /var/log/vmware/vpxd/vpxd.log on the vCenter has many references to multiple hosts going into not-responding state for ~1 second:

YYYY-MM-DDTHH:MM:27.407Z warning vpxd[#####] [Originator@6876 sub=MoHost opID=HB-host-#####@######-########] host [vim.HostSystem:host-#####,<ESXI_FQDN>] connection state changed to NO_RESPONSE
YYYY-MM-DDTHH:MM:27.661Z info vpxd[#####] [Originator@6876 sub=MoHost opID=HB-host-#####@######-########] host [vim.HostSystem:host-#####,<ESXI_FQDN>] connection state changed to CONNECTED
YYYY-MM-DDTHH:MM:08.012Z warning vpxd[#####] [Originator@6876 sub=MoHost opID=HB-host-#####@######-########] host [vim.HostSystem:host-#####,<ESXI_FQDN>] connection state changed to NO_RESPONSE
YYYY-MM-DDTHH:MM:08.128Z info vpxd[#####] [Originator@6876 sub=MoHost opID=HB-host-#####@######-########] host [vim.HostSystem:host-#####,<ESXI_FQDN>] connection state changed to CONNECTED
YYYY-MM-DDTHH:MM:48.811Z warning vpxd[########] [Originator@6876 sub=MoHost opID=HB-host-#####@######-########] host [vim.HostSystem:host-#####,<ESXI_FQDN>] connection state changed to NO_RESPONSE
YYYY-MM-DDTHH:MM:49.071Z info vpxd[########] [Originator@6876 sub=MoHost opID=HB-host-#####@######-########] host [vim.HostSystem:host-#####,<ESXI_FQDN>] connection state changed to CONNECTED

  • /var/log/vmware/envoy-hgw/envoy-access-##.log on the vCenter has many references to "upstream_reset_before_response_started{connection_termination}" messages at the same timestamps of that ESXi being marked as not-responding

 

 

Environment

vSphere 7.0.x
vSphere 8.0.x

Cause

The firewall between ESXi and vCenter has Time To Live (TTL) for port 443 set too low and is dropping connections considered idle. 

To work around this issue TCP keep alive was introduced to keep the communication established. 

Pre 6.7 the keep-alive timeout value was set to 30 minutes.
Post 6.7 this keep-alive timeout was set to 15 minutes.

If a firewall time's out the connection before 15 minutes vCenter will be unable to communicate with the host until the TCP keep-alive is sent and the connection is reconnected.
 
Even with this TCP keep alive in place; the TTL set on the firewall will overrule any keep alive config if set too low

Resolution

This issue must be remediated on the physical firewall, please contact the firewall vendor to increase the timeout to standards which is 1 hour. The recommended minimum is 30mins.

 
Workaround:
You can change the VC/ESXi settings to lower the keep alive to be more aggressive. Only one side must be changed. 

Applying aggressive keep-alives to VC-ESX connections. This can be done on the VC side by adding the following block to "/etc/vmware-vpx/vpxd.cfg", under the <vmacore> section:

 <tcpKeepAlive>
    <serverSocket>
       <isEnabled>true</isEnabled>
       <idleTimeSec>45</idleTimeSec>
       <probeTimeSec>10</probeTimeSec>
       <probeCount>3</probeCount>
    </serverSocket>
    <clientSocket>
       <isEnabled>true</isEnabled>
       <idleTimeSec>45</idleTimeSec>
       <probeTimeSec>10</probeTimeSec>
       <probeCount>3</probeCount>
    </clientSocket>
 </tcpKeepAlive>

Or on the ESX side by editing "/etc/vmware/rhttpproxy/config.xml" (which already has a <tcpKeepAlive> block) to contain the above values.

Restart vpxd or vpxa services after the change

Refere article Restarting the Management agents in ESXi to restart vpxa service.

# /etc/init.d/vpxa restart


Refer article Stopping, Starting or Restarting VMware vCenter Server Appliance 6.x & above services to restart vCenter vpxd service.

# service-control --stop vpxd && service-control --start vpxd

Additional Information