ESXi hosts disconnect intermittently from vCenter and reconnect after few seconds
search cancel

ESXi hosts disconnect intermittently from vCenter and reconnect after few seconds

book

Article ID: 317813

calendar_today

Updated On:

Products

VMware vCenter Server VMware vSphere ESXi

Issue/Introduction

To stop the ESXi host disconnect.

Symptoms:

  • ESXi hosts disconnect intermittently from vCenter and reconnect  after few seconds
  • We see from vpxd.log that hosts are being marked as having no response

YYYY-MM-DD warning vpxd[7F4E72BDE700] [Originator@6876 sub=MoHost opID=khgf51ol-j42-h5:70003542-b5-03-177f343e] [HostMo] host connection state changed to [NO_RESPONSE] for host-118631
  • We see from vpxd.log that Synchronizing tasks are not getting completed causing host to disconnect.
YYYY-MM-DD verbose vpxd[7F79218B1700] [Originator@6876 sub=vpxLro opID=HB-host-486@8725-487c45] [VpxLRO] -- BEGIN lro-4750 -- -- VpxdInvtHostSyncHostLRO.Synchronize --
YYYY-MM-DD verbose vpxd[7F79218B1700] [Originator@6876 sub=InvtHostCnx opID=HB-host-486@8725-487c45] Synchronizing host; [vim.HostSystem:host-486,10.20.x.x]
YYYY-MM-DD trivia vpxd[7F7922A54700] [Originator@6876 sub=HttpConnectionPool-000025] [PopPendingConnection] No pending connections to <cs p:00007f792812fe00, TCP:10.20..x.x:443>
YYYY-MM-DD trivia vpxd[7F79218B1700] [Originator@6876 sub=HttpConnectionPool-000000 opID=HB-host-486@8725-487c45] [PopPendingConnection] No pending connections to <cs p:00007f792812fe00, TCP:10.20.x.x:443>
YYYY-MM-DD verbose vpxd[7F79218B1700] [Originator@6876 sub=Vmomi opID=HB-host-486@8725-487c45] [ClientAdapterBase::InvokeOnSoap] Invoke done (10.20.x.x, vpxapi.VpxaService.retrieveChanges)
YYYY-MM-DD trivia vpxd[7F79218B1700] [Originator@6876 sub=HttpConnectionPool-000000 opID=HB-host-486@8725-487c45] [PopPendingConnection] No pending connections to <cs p:00007f792812fe00, TCP:10..x.x:443>
YYYY-MM-DD verbose vpxd[7F79218B1700] [Originator@6876 sub=Vmomi opID=HB-host-486@8725-487c45] [ClientAdapterBase::InvokeOnSoap] Invoke done (10.20.x.x, vmodl.query.PropertyCollector.createFilter)
YYYY-MM-DD trivia vpxd[7F79218B1700] [Originator@6876 sub=VpxProfiler opID=HB-host-486@8725-487c45] Ctr: TrackInBackgroundTime = 0 ms
YYYY-MM-DD trivia vpxd[7F7922C58700] [Originator@6876 sub=SoapAdapter.HTTPService] HTTP Response: Setting Content-Length: -1
 
  • hostd.log around the same time for same host.
YYYY-MM-DD verbose vpxa[4F925B70] [Originator@6876 sub=PropertyProvider opID=HB-host-486@8725-487c45-14] RecordOp ASSIGN: info.reason, session[########-####-####-####-########7e57]########-####-####-####-########f8c1. Applied change to temp map.
YYYY-MM-DD verbose vpxa[4F925B70] [Originator@6876 sub=PropertyProvider opID=HB-host-486@8725-487c45-14] [CommitChangesAndNotify] Updating cached values
YYYY-MM-DD verbose vpxa[4F967B70] [Originator@6876 sub=PropertyProvider] RecordOp ASSIGN: info.startTime, session[########-####-####-####-########7e57]########-####-####-####-########f8c1. Applied change to temp map.
YYYY-MM-DD verbose vpxa[4F967B70] [Originator@6876 sub=PropertyProvider] [CommitChangesAndNotify] Updating cached values
YYYY-MM-DD verbose vpxa[4F967B70] [Originator@6876 sub=PropertyProvider opID=HB-host-486@8725-487c45-14] RecordOp ASSIGN: info.state, session[########-####-####-####-########7e57]########-####-####-####-########f8c1. Applied change to temp map.
YYYY-MM-DD verbose vpxa[4F967B70] [Originator@6876 sub=PropertyProvider opID=HB-host-486@8725-487c45-14] RecordOp ASSIGN: info.cancelable, session[########-####-####-####-########7e57]########-####-####-####-########f8c1. Applied change to temp map.
YYYY-MM-DD verbose vpxa[4F967B70] [Originator@6876 sub=PropertyProvider opID=HB-host-486@8725-487c45-14] [CommitChangesAndNotify] Updating cached values
YYYY-MM-DD info vpxa[4F967B70] [Originator@6876 sub=vpxLro opID=HB-host-486@8725-487c45-14] [VpxLRO] -- BEGIN session[########-####-####-####-########7e57]########-####-####-####-########f8c1 -- vpxa -- vpxapi.VpxaService.retrieveChanges -- ########-####-####-####-########7e57
YYYY-MM-DD verbose vpxa[4F967B70] [Originator@6876 sub=VpxProfiler opID=HB-host-486@8725-487c45-14] [1+] VpxaMoService::GetChanges
YYYY-MM-DD verbose vpxa[4F967B70] [Originator@6876 sub=VpxaHalCnxHostagent opID=HB-host-486@8725-487c45-14] [VpxaHalCnxHostagent::DoCheckForUpdates] CheckForUpdates() done; version sent: 315435, version received: (no update)
  • VPXD crashing, for all the host whenever there is a disconnect.
  • In the vpxd.log we see similar to below
YYYY-MM-DD error vpxd[7F7968478700] [Originator@6876 sub=Vmomi opID=HB-host-486@8715-6a3c7a68-TaskLoop-28887855] Got vmacore exception: N7Vmacore15SystemExceptionE(Connection timed out)
--> [context]zKq7AVECAAAAAIStcgAMdnB4ZAAAeF4rbGlidm1hY29yZS5zbwAAEBcbAMppGADIHysAcRUjAOv5JQAe4SMAR6wjAC6vIwA3tysBVHQAbGlicHRocmVhZC5zby4wAAIdjA5saWJjLnNvLjYA[/context]
YYYY-MM-DD verbose vpxd[7F7968478700] [Originator@6876 sub=TaskInfo opID=HB-host-486@8715-6a3c7a68-TaskLoop-28887855] [VpxdTaskInfoPublisher] host:[host-486] WaitForUpdates failed: vmodl.fault.HostCommunication
--> [backtrace begin] product: VMware VirtualCenter, version: 6.5.0, build: build-7515524, tag: vpxd, cpu: x86_64, os: linux, buildType: release
--> backtrace[00] libvmacore.so[0x002B5E78]: Vmacore::System::Stacktrace::CaptureWork(unsigned int)
--> backtrace[01] libvmacore.so[0x001B1710]: Vmacore::System::SystemFactoryImpl::CreateQuickBacktrace(Vmacore::Ref<Vmacore::System::Backtrace>&)
--> backtrace[02] libvmacore.so[0x001869CA]: Vmacore::Throwable::Throwable(std::string const&)
--> backtrace[03] vpxd[0x0054D40E]
--> backtrace[04] vpxd[0x00622866]
--> backtrace[05] vpxd[0x00628FBC]
--> backtrace[06] vpxd[0x00622F65]
--> backtrace[07] vpxd[0x00626877]
--> backtrace[08] libvmomi.so[0x0012623C]: Vmomi::StubImpl::_Invoke_Task(Vmomi::ManagedMethod*, std::vector<Vmacore::Ref<Vmomi::Any>, std::allocator<Vmacore::Ref<Vmomi::Any> > >&, Vmacore::Ref<Vmomi::Any>&)
--> backtrace[09] libvmomi.so[0x00199CBC]: Vmomi::Core::PropertyCollectorStub::WaitForUpdates(Vmacore::Optional<std::string> const&, Vmacore::Ref<Vmomi::Core::PropertyCollector::UpdateSet>&)
--> backtrace[10] vpxd[0x0093BADC]
--> backtrace[11] vpxd[0x0093CD0C]
--> backtrace[12] vpxd[0x00542747]
--> backtrace[13] vpxd[0x005427E0]
--> backtrace[14] libvmacore.so[0x0023AC47]
--> backtrace[15] libvmacore.so[0x0023AF2E]
--> backtrace[16] libvmacore.so[0x002BB737]
--> backtrace[17] libpthread.so.0[0x00007454]
--> backtrace[18] libc.so.6[0x000E8C1D]
--> [backtrace end]
  • rhttp packet capture analysis shows packet drops. 
  • The captured packets at the VC side  will show similar to below
323 YYYY-MM-DD 192.168.xxx.xx 192.168.xxx.xx TCP 337 54552 → 443 [PSH, ACK] Seq=1 Ack=1 Win=850 Len=281
341 YYYY-MM-DD 192.168.xxx.xx 192.168.xxx.xx TCP 6152 54552 → 443 [PSH, ACK] Seq=282 Ack=1 Win=850 Len=6096
352 YYYY-MM-DD 192.168.xxx.xx 192.168.xxx.xx TCP 62 443 → 54552 [ACK] Seq=1 Ack=6378 Win=116 Len=0
353 YYYY-MM-DD 192.168.xxx.xx 192.168.xxx.xx TCP 62 [TCP Window Update] 443 → 54552 [ACK] Seq=1 Ack=6378 Win=128 Len=0
411 YYYY-MM-DD 192.168.xxx.xx 192.168.xxx.xx TCP 19189 443 → 54552 [PSH, ACK] Seq=1 Ack=6378 Win=128 Len=19133
412 YYYY-MM-DD 192.168.xxx.xx 192.168.xxx.xx TCP 56 54552 → 443 [ACK] Seq=6378 Ack=19134 Win=827 Len=0
24683 YYYY-MM-DD 192.168.xxx.xx 192.168.xxx.xx TCP 336 54552 → 443 [PSH, ACK] Seq=6378 Ack=19134 Win=850 Len=280
24684 YYYY-MM-DD 192.168.xxx.xx 192.168.xxx.xx TCP 798 54552 → 443 [PSH, ACK] Seq=6658 Ack=19134 Win=850 Len=742
24700 YYYY-MM-DD 192.168.xxx.xx 192.168.xxx.xx TCP 1078 [TCP Retransmission] 54552 → 443 [PSH, ACK] Seq=6378 Ack=19134 Win=850 Len=1022
 
  • And on the ESXi we will see packet capture analysis as below

566 YYYY-MM-DD 192.168.xxx.xx 192.168.xxx.xx TCP 54 443 → 54552 [FIN, ACK] Seq=1 Ack=1 Win=128 Len=0
567 YYYY-MM-DD 192.168.xxx.xx 192.168.xxx.xx TCP 54 [TCP Retransmission] 443 → 54552 [FIN, ACK] Seq=1 Ack=1 Win=128 Len=0
568 YYYY-MM-DD 192.168.xxx.xx 192.168.xxx.xx TCP 54
614 YYYY-MM-DD 192.168.xxx.xx 192.168.xxx.xx TCP 54 443 → 54552 [RST, ACK] Seq=2 Ack=1 Win=128 Len=0

 

Note:The preceding log excerpts are only example.Date,time and environmental variables may vary depending on your environment



Environment

VMware vSphere ESXi 6.5
VMware vCenter Server 6.5.x
VMware vCenter Server Appliance 6.5.x
VMware vSphere ESXi 6.7
VMware vCenter Server 6.7.x
VMware vCenter Server Appliance 6.7.x
VMware vCenter Server 7.0.x

Cause

It issue caused by packet drop between the vCenter and hosts. The re-transmissions on both sides are clear indication that the connection timed out because of lost network connectivity - both sides were sending packets that the other side didn't receive. This fix enables sending TCP keep-alive packets when the connection is idle.

This can also happen when the vCenter is under very high load and constrained on resources

Resolution

This issue is resolved in:
  • vCenter Server 6.5 U3f build 15259038.
  • ESXi 6.5  Patch Release ESXi650-201912002.
  • vCenter 6.7 U3m build 17713310.
  • ESXi 6.7 P05 ESXi670-20210300.


Workaround:
To work around this issue, please add the lines on the below files:
  • In the config.xml on ESXi host located at /etc/vmware/rhttpproxy/
NOTE: Take a backup of this file before modifying.


    <config>
   <vmacore>
//some lines of other options ......
      <tcpKeepAlive>
         <serverSocket>
            <isEnabled>true</isEnabled>
            <idleTimeSec>180</idleTimeSec>
            <probeTimeSec>180</probeTimeSec>
            <probeCount>8</probeCount>
         </serverSocket>
         <clientSocket>
            <isEnabled>true</isEnabled>
            <idleTimeSec>180</idleTimeSec>
            <probeTimeSec>180</probeTimeSec>
            <probeCount>8</probeCount>
         </clientSocket>
      </tcpKeepAlive>
//some lines of other options......
   </vmacore>
</config>


Additional Information