ESXi hosts disconnect intermittently from vCenter and reconnect after few seconds
search cancel

ESXi hosts disconnect intermittently from vCenter and reconnect after few seconds

book

Article ID: 317813

calendar_today

Updated On:

Products

VMware vCenter Server VMware vSphere ESXi

Issue/Introduction

To stop the ESXi host disconnect.

Symptoms:

  • ESXi hosts disconnect intermittently from vCenter and reconnect  after few seconds
  • We see from vpxd.log that hosts are being marked as having no response

2020-11-13T17:13:50.024Z warning vpxd[7F4E72BDE700] [Originator@6876 sub=MoHost opID=khgf51ol-j42-h5:70003542-b5-03-177f343e] [HostMo] host connection state changed to [NO_RESPONSE] for host-118631
  • We see from vpxd.log that Synchronizing tasks are not getting completed causing host to disconnect.
2018-08-28T17:20:09.901Z verbose vpxd[7F79218B1700] [Originator@6876 sub=vpxLro opID=HB-host-486@8725-487c45] [VpxLRO] -- BEGIN lro-4750 -- -- VpxdInvtHostSyncHostLRO.Synchronize --
2018-08-28T17:20:09.901Z verbose vpxd[7F79218B1700] [Originator@6876 sub=InvtHostCnx opID=HB-host-486@8725-487c45] Synchronizing host; [vim.HostSystem:host-486,10.20.x.x]
2018-08-28T17:20:09.909Z trivia vpxd[7F7922A54700] [Originator@6876 sub=HttpConnectionPool-000025] [PopPendingConnection] No pending connections to <cs p:00007f792812fe00, TCP:10.20..x.x:443>
2018-08-28T17:20:09.909Z trivia vpxd[7F79218B1700] [Originator@6876 sub=HttpConnectionPool-000000 opID=HB-host-486@8725-487c45] [PopPendingConnection] No pending connections to <cs p:00007f792812fe00, TCP:10.20.x.x:443>
2018-08-28T17:20:09.909Z verbose vpxd[7F79218B1700] [Originator@6876 sub=Vmomi opID=HB-host-486@8725-487c45] [ClientAdapterBase::InvokeOnSoap] Invoke done (10.20.x.x, vpxapi.VpxaService.retrieveChanges)
2018-08-28T17:20:09.917Z trivia vpxd[7F79218B1700] [Originator@6876 sub=HttpConnectionPool-000000 opID=HB-host-486@8725-487c45] [PopPendingConnection] No pending connections to <cs p:00007f792812fe00, TCP:10..x.x:443>
2018-08-28T17:20:09.917Z verbose vpxd[7F79218B1700] [Originator@6876 sub=Vmomi opID=HB-host-486@8725-487c45] [ClientAdapterBase::InvokeOnSoap] Invoke done (10.20.x.x, vmodl.query.PropertyCollector.createFilter)
2018-08-28T17:20:09.917Z trivia vpxd[7F79218B1700] [Originator@6876 sub=VpxProfiler opID=HB-host-486@8725-487c45] Ctr: TrackInBackgroundTime = 0 ms
2018-08-28T17:20:09.994Z trivia vpxd[7F7922C58700] [Originator@6876 sub=SoapAdapter.HTTPService] HTTP Response: Setting Content-Length: -1
 
  • hostd.log around the same time for same host.
2018-08-28T17:19:36.629Z verbose vpxa[4F925B70] [Originator@6876 sub=PropertyProvider opID=HB-host-486@8725-487c45-14] RecordOp ASSIGN: info.reason, session[########-####-####-####-########7e57]########-####-####-####-########f8c1. Applied change to temp map.
2018-08-28T17:19:36.629Z verbose vpxa[4F925B70] [Originator@6876 sub=PropertyProvider opID=HB-host-486@8725-487c45-14] [CommitChangesAndNotify] Updating cached values
2018-08-28T17:19:36.629Z verbose vpxa[4F967B70] [Originator@6876 sub=PropertyProvider] RecordOp ASSIGN: info.startTime, session[########-####-####-####-########7e57]########-####-####-####-########f8c1. Applied change to temp map.
2018-08-28T17:19:36.629Z verbose vpxa[4F967B70] [Originator@6876 sub=PropertyProvider] [CommitChangesAndNotify] Updating cached values
2018-08-28T17:19:36.629Z verbose vpxa[4F967B70] [Originator@6876 sub=PropertyProvider opID=HB-host-486@8725-487c45-14] RecordOp ASSIGN: info.state, session[########-####-####-####-########7e57]########-####-####-####-########f8c1. Applied change to temp map.
2018-08-28T17:19:36.629Z verbose vpxa[4F967B70] [Originator@6876 sub=PropertyProvider opID=HB-host-486@8725-487c45-14] RecordOp ASSIGN: info.cancelable, session[########-####-####-####-########7e57]########-####-####-####-########f8c1. Applied change to temp map.
2018-08-28T17:19:36.629Z verbose vpxa[4F967B70] [Originator@6876 sub=PropertyProvider opID=HB-host-486@8725-487c45-14] [CommitChangesAndNotify] Updating cached values
2018-08-28T17:19:36.629Z info vpxa[4F967B70] [Originator@6876 sub=vpxLro opID=HB-host-486@8725-487c45-14] [VpxLRO] -- BEGIN session[########-####-####-####-########7e57]########-####-####-####-########f8c1 -- vpxa -- vpxapi.VpxaService.retrieveChanges -- ########-####-####-####-########7e57
2018-08-28T17:19:36.629Z verbose vpxa[4F967B70] [Originator@6876 sub=VpxProfiler opID=HB-host-486@8725-487c45-14] [1+] VpxaMoService::GetChanges
2018-08-28T17:19:36.631Z verbose vpxa[4F967B70] [Originator@6876 sub=VpxaHalCnxHostagent opID=HB-host-486@8725-487c45-14] [VpxaHalCnxHostagent::DoCheckForUpdates] CheckForUpdates() done; version sent: 315435, version received: (no update)
  • VPXD crashing, for all the host whenever there is a disconnect.
  • In the vpxd.log we see similar to below
2018-08-28T16:35:49.371Z error vpxd[7F7968478700] [Originator@6876 sub=Vmomi opID=HB-host-486@8715-6a3c7a68-TaskLoop-28887855] Got vmacore exception: N7Vmacore15SystemExceptionE(Connection timed out)
--> [context]zKq7AVECAAAAAIStcgAMdnB4ZAAAeF4rbGlidm1hY29yZS5zbwAAEBcbAMppGADIHysAcRUjAOv5JQAe4SMAR6wjAC6vIwA3tysBVHQAbGlicHRocmVhZC5zby4wAAIdjA5saWJjLnNvLjYA[/context]
2018-08-28T16:35:49.374Z verbose vpxd[7F7968478700] [Originator@6876 sub=TaskInfo opID=HB-host-486@8715-6a3c7a68-TaskLoop-28887855] [VpxdTaskInfoPublisher] host:[host-486] WaitForUpdates failed: vmodl.fault.HostCommunication
--> [backtrace begin] product: VMware VirtualCenter, version: 6.5.0, build: build-7515524, tag: vpxd, cpu: x86_64, os: linux, buildType: release
--> backtrace[00] libvmacore.so[0x002B5E78]: Vmacore::System::Stacktrace::CaptureWork(unsigned int)
--> backtrace[01] libvmacore.so[0x001B1710]: Vmacore::System::SystemFactoryImpl::CreateQuickBacktrace(Vmacore::Ref<Vmacore::System::Backtrace>&)
--> backtrace[02] libvmacore.so[0x001869CA]: Vmacore::Throwable::Throwable(std::string const&)
--> backtrace[03] vpxd[0x0054D40E]
--> backtrace[04] vpxd[0x00622866]
--> backtrace[05] vpxd[0x00628FBC]
--> backtrace[06] vpxd[0x00622F65]
--> backtrace[07] vpxd[0x00626877]
--> backtrace[08] libvmomi.so[0x0012623C]: Vmomi::StubImpl::_Invoke_Task(Vmomi::ManagedMethod*, std::vector<Vmacore::Ref<Vmomi::Any>, std::allocator<Vmacore::Ref<Vmomi::Any> > >&, Vmacore::Ref<Vmomi::Any>&)
--> backtrace[09] libvmomi.so[0x00199CBC]: Vmomi::Core::PropertyCollectorStub::WaitForUpdates(Vmacore::Optional<std::string> const&, Vmacore::Ref<Vmomi::Core::PropertyCollector::UpdateSet>&)
--> backtrace[10] vpxd[0x0093BADC]
--> backtrace[11] vpxd[0x0093CD0C]
--> backtrace[12] vpxd[0x00542747]
--> backtrace[13] vpxd[0x005427E0]
--> backtrace[14] libvmacore.so[0x0023AC47]
--> backtrace[15] libvmacore.so[0x0023AF2E]
--> backtrace[16] libvmacore.so[0x002BB737]
--> backtrace[17] libpthread.so.0[0x00007454]
--> backtrace[18] libc.so.6[0x000E8C1D]
--> [backtrace end]
  • rhttp packet capture analysis shows packet drops. 
  • The captured packets at the VC side  will show similar to below
323 2019-01-15 14:45:05.001649 192.168.xxx.xx 192.168.xxx.xx TCP 337 54552 → 443 [PSH, ACK] Seq=1 Ack=1 Win=850 Len=281
    341 2019-01-15 14:45:05.005090 192.168.xxx.xx 192.168.xxx.xx TCP 6152 54552 → 443 [PSH, ACK] Seq=282 Ack=1 Win=850 Len=6096
    352 2019-01-15 14:45:05.005374 192.168.xxx.xx 192.168.xxx.xx TCP 62 443 → 54552 [ACK] Seq=1 Ack=6378 Win=116 Len=0
    353 2019-01-15 14:45:05.005386 192.168.xxx.xx 192.168.xxx.xx TCP 62 [TCP Window Update] 443 → 54552 [ACK] Seq=1 Ack=6378 Win=128 Len=0
    411 2019-01-15 14:45:05.051974 192.168.xxx.xx 192.168.xxx.xx TCP 19189 443 → 54552 [PSH, ACK] Seq=1 Ack=6378 Win=128 Len=19133
    412 2019-01-15 14:45:05.051992 192.168.xxx.xx 192.168.xxx.xx TCP 56 54552 → 443 [ACK] Seq=6378 Ack=19134 Win=827 Len=0
  24683 2019-01-15 14:56:14.371966 192.168.xxx.xx 192.168.xxx.xx TCP 336 54552 → 443 [PSH, ACK] Seq=6378 Ack=19134 Win=850 Len=280
  24684 2019-01-15 14:56:14.372086 192.168.xxx.xx 192.168.xxx.xx TCP 798 54552 → 443 [PSH, ACK] Seq=6658 Ack=19134 Win=850 Len=742
  24700 2019-01-15 14:56:14.574974 192.168.xxx.xx 192.168.xxx.xx TCP 1078 [TCP Retransmission] 54552 → 443 [PSH, ACK] Seq=6378 Ack=19134 Win=850 Len=1022
 
  • And on the ESXi we will see packet capture analysis as below

566 2019-01-15 15:05:07.444059 192.168.xxx.xx 192.168.xxx.xx TCP 54 443 → 54552 [FIN, ACK] Seq=1 Ack=1 Win=128 Len=0
567 2019-01-15 15:05:07.693861 192.168.xxx.xx 192.168.xxx.xx TCP 54 [TCP Retransmission] 443 → 54552 [FIN, ACK] Seq=1 Ack=1 Win=128 Len=0
568 2019-01-15 15:05:07.983860 192.168.xxx.xx 192.168.xxx.xx TCP 54
614 2019-01-15 15:06:52.533881 192.168.xxx.xx 192.168.xxx.xx TCP 54 443 → 54552 [RST, ACK] Seq=2 Ack=1 Win=128 Len=0

 

Note:The preceding log excerpts are only example.Date,time and environmental variables may vary depending on your environment



Environment

VMware vSphere ESXi 6.7
VMware vCenter Server 6.7.x
VMware vCenter Server Appliance 6.7.x
VMware vCenter Server 6.5.x
VMware vCenter Server Appliance 6.5.x
VMware vSphere ESXi 6.5
VMware vCenter Server 7.0.x

Cause

It issue caused by packet drop between the vCenter and hosts.The re-transmissions on both sides are clear indication that the connection timed out because of lost network connectivity - both sides were sending packets that the other side didn't receive.This fix enables sending TCP keep-alive packets when the connection is idle.

This can also happen when the vCenter is under very high load and constrained on resources

Resolution

This issue is resolved in:

Workaround:
To work around this issue, please add the lines on the below files:
  • In the config.xml on ESXi host located at /etc/vmware/rhttpproxy/
NOTE: Take a backup of this file before modifying.


    <config>
   <vmacore>
//some lines of other options ......
      <tcpKeepAlive>
         <serverSocket>
            <isEnabled>true</isEnabled>
            <idleTimeSec>180</idleTimeSec>
            <probeTimeSec>180</probeTimeSec>
            <probeCount>8</probeCount>
         </serverSocket>
         <clientSocket>
            <isEnabled>true</isEnabled>
            <idleTimeSec>180</idleTimeSec>
            <probeTimeSec>180</probeTimeSec>
            <probeCount>8</probeCount>
         </clientSocket>
      </tcpKeepAlive>
//some lines of other options......
   </vmacore>
</config>


Additional Information