To stop the ESXi host disconnect.
Symptoms:
- ESXi hosts disconnect intermittently from vCenter and reconnect after few seconds
- We see from vpxd.log that hosts are being marked as having no response
2020-11-13T17:13:50.024Z warning vpxd[7F4E72BDE700] [Originator@6876 sub=MoHost opID=khgf51ol-j42-h5:70003542-b5-03-177f343e] [HostMo] host connection state changed to [NO_RESPONSE] for host-118631
- We see from vpxd.log that Synchronizing tasks are not getting completed causing host to disconnect.
2018-08-28T17:20:09.901Z verbose vpxd[7F79218B1700] [Originator@6876 sub=vpxLro opID=HB-host-486@8725-487c45] [VpxLRO] -- BEGIN lro-4750 -- -- VpxdInvtHostSyncHostLRO.Synchronize --
2018-08-28T17:20:09.901Z verbose vpxd[7F79218B1700] [Originator@6876 sub=InvtHostCnx opID=HB-host-486@8725-487c45] Synchronizing host; [vim.HostSystem:host-486,10.20.x.x]
2018-08-28T17:20:09.909Z trivia vpxd[7F7922A54700] [Originator@6876 sub=HttpConnectionPool-000025] [PopPendingConnection] No pending connections to <cs p:00007f792812fe00, TCP:10.20..x.x:443>
2018-08-28T17:20:09.909Z trivia vpxd[7F79218B1700] [Originator@6876 sub=HttpConnectionPool-000000 opID=HB-host-486@8725-487c45] [PopPendingConnection] No pending connections to <cs p:00007f792812fe00, TCP:10.20.x.x:443>
2018-08-28T17:20:09.909Z verbose vpxd[7F79218B1700] [Originator@6876 sub=Vmomi opID=HB-host-486@8725-487c45] [ClientAdapterBase::InvokeOnSoap] Invoke done (10.20.x.x, vpxapi.VpxaService.retrieveChanges)
2018-08-28T17:20:09.917Z trivia vpxd[7F79218B1700] [Originator@6876 sub=HttpConnectionPool-000000 opID=HB-host-486@8725-487c45] [PopPendingConnection] No pending connections to <cs p:00007f792812fe00, TCP:10..x.x:443>
2018-08-28T17:20:09.917Z verbose vpxd[7F79218B1700] [Originator@6876 sub=Vmomi opID=HB-host-486@8725-487c45] [ClientAdapterBase::InvokeOnSoap] Invoke done (10.20.x.x, vmodl.query.PropertyCollector.createFilter)
2018-08-28T17:20:09.917Z trivia vpxd[7F79218B1700] [Originator@6876 sub=VpxProfiler opID=HB-host-486@8725-487c45] Ctr: TrackInBackgroundTime = 0 ms
2018-08-28T17:20:09.994Z trivia vpxd[7F7922C58700] [Originator@6876 sub=SoapAdapter.HTTPService] HTTP Response: Setting Content-Length: -1
- hostd.log around the same time for same host.
2018-08-28T17:19:36.629Z verbose vpxa[4F925B70] [Originator@6876 sub=PropertyProvider opID=HB-host-486@8725-487c45-14] RecordOp ASSIGN: info.reason, session[########-####-####-####-########7e57]########-####-####-####-########f8c1. Applied change to temp map.
2018-08-28T17:19:36.629Z verbose vpxa[4F925B70] [Originator@6876 sub=PropertyProvider opID=HB-host-486@8725-487c45-14] [CommitChangesAndNotify] Updating cached values
2018-08-28T17:19:36.629Z verbose vpxa[4F967B70] [Originator@6876 sub=PropertyProvider] RecordOp ASSIGN: info.startTime, session[########-####-####-####-########7e57]########-####-####-####-########f8c1. Applied change to temp map.
2018-08-28T17:19:36.629Z verbose vpxa[4F967B70] [Originator@6876 sub=PropertyProvider] [CommitChangesAndNotify] Updating cached values
2018-08-28T17:19:36.629Z verbose vpxa[4F967B70] [Originator@6876 sub=PropertyProvider opID=HB-host-486@8725-487c45-14] RecordOp ASSIGN: info.state, session[########-####-####-####-########7e57]########-####-####-####-########f8c1. Applied change to temp map.
2018-08-28T17:19:36.629Z verbose vpxa[4F967B70] [Originator@6876 sub=PropertyProvider opID=HB-host-486@8725-487c45-14] RecordOp ASSIGN: info.cancelable, session[########-####-####-####-########7e57]########-####-####-####-########f8c1. Applied change to temp map.
2018-08-28T17:19:36.629Z verbose vpxa[4F967B70] [Originator@6876 sub=PropertyProvider opID=HB-host-486@8725-487c45-14] [CommitChangesAndNotify] Updating cached values
2018-08-28T17:19:36.629Z info vpxa[4F967B70] [Originator@6876 sub=vpxLro opID=HB-host-486@8725-487c45-14] [VpxLRO] -- BEGIN session[########-####-####-####-########7e57]########-####-####-####-########f8c1 -- vpxa -- vpxapi.VpxaService.retrieveChanges -- ########-####-####-####-########7e57
2018-08-28T17:19:36.629Z verbose vpxa[4F967B70] [Originator@6876 sub=VpxProfiler opID=HB-host-486@8725-487c45-14] [1+] VpxaMoService::GetChanges
2018-08-28T17:19:36.631Z verbose vpxa[4F967B70] [Originator@6876 sub=VpxaHalCnxHostagent opID=HB-host-486@8725-487c45-14] [VpxaHalCnxHostagent::DoCheckForUpdates] CheckForUpdates() done; version sent: 315435, version received: (no update)
- VPXD crashing, for all the host whenever there is a disconnect.
- In the vpxd.log we see similar to below
2018-08-28T16:35:49.371Z error vpxd[7F7968478700] [Originator@6876 sub=Vmomi opID=HB-host-486@8715-6a3c7a68-TaskLoop-28887855] Got vmacore exception: N7Vmacore15SystemExceptionE(Connection timed out)
--> [context]zKq7AVECAAAAAIStcgAMdnB4ZAAAeF4rbGlidm1hY29yZS5zbwAAEBcbAMppGADIHysAcRUjAOv5JQAe4SMAR6wjAC6vIwA3tysBVHQAbGlicHRocmVhZC5zby4wAAIdjA5saWJjLnNvLjYA[/context]
2018-08-28T16:35:49.374Z verbose vpxd[7F7968478700] [Originator@6876 sub=TaskInfo opID=HB-host-486@8715-6a3c7a68-TaskLoop-28887855] [VpxdTaskInfoPublisher] host:[host-486] WaitForUpdates failed: vmodl.fault.HostCommunication
--> [backtrace begin] product: VMware VirtualCenter, version: 6.5.0, build: build-7515524, tag: vpxd, cpu: x86_64, os: linux, buildType: release
--> backtrace[00] libvmacore.so[0x002B5E78]: Vmacore::System::Stacktrace::CaptureWork(unsigned int)
--> backtrace[01] libvmacore.so[0x001B1710]: Vmacore::System::SystemFactoryImpl::CreateQuickBacktrace(Vmacore::Ref<Vmacore::System::Backtrace>&)
--> backtrace[02] libvmacore.so[0x001869CA]: Vmacore::Throwable::Throwable(std::string const&)
--> backtrace[03] vpxd[0x0054D40E]
--> backtrace[04] vpxd[0x00622866]
--> backtrace[05] vpxd[0x00628FBC]
--> backtrace[06] vpxd[0x00622F65]
--> backtrace[07] vpxd[0x00626877]
--> backtrace[08] libvmomi.so[0x0012623C]: Vmomi::StubImpl::_Invoke_Task(Vmomi::ManagedMethod*, std::vector<Vmacore::Ref<Vmomi::Any>, std::allocator<Vmacore::Ref<Vmomi::Any> > >&, Vmacore::Ref<Vmomi::Any>&)
--> backtrace[09] libvmomi.so[0x00199CBC]: Vmomi::Core::PropertyCollectorStub::WaitForUpdates(Vmacore::Optional<std::string> const&, Vmacore::Ref<Vmomi::Core::PropertyCollector::UpdateSet>&)
--> backtrace[10] vpxd[0x0093BADC]
--> backtrace[11] vpxd[0x0093CD0C]
--> backtrace[12] vpxd[0x00542747]
--> backtrace[13] vpxd[0x005427E0]
--> backtrace[14] libvmacore.so[0x0023AC47]
--> backtrace[15] libvmacore.so[0x0023AF2E]
--> backtrace[16] libvmacore.so[0x002BB737]
--> backtrace[17] libpthread.so.0[0x00007454]
--> backtrace[18] libc.so.6[0x000E8C1D]
--> [backtrace end]
- rhttp packet capture analysis shows packet drops.
- The captured packets at the VC side will show similar to below
323 2019-01-15 14:45:05.001649 192.168.xxx.xx 192.168.xxx.xx TCP 337 54552 → 443 [PSH, ACK] Seq=1 Ack=1 Win=850 Len=281
341 2019-01-15 14:45:05.005090 192.168.xxx.xx 192.168.xxx.xx TCP 6152 54552 → 443 [PSH, ACK] Seq=282 Ack=1 Win=850 Len=6096
352 2019-01-15 14:45:05.005374 192.168.xxx.xx 192.168.xxx.xx TCP 62 443 → 54552 [ACK] Seq=1 Ack=6378 Win=116 Len=0
353 2019-01-15 14:45:05.005386 192.168.xxx.xx 192.168.xxx.xx TCP 62 [TCP Window Update] 443 → 54552 [ACK] Seq=1 Ack=6378 Win=128 Len=0
411 2019-01-15 14:45:05.051974 192.168.xxx.xx 192.168.xxx.xx TCP 19189 443 → 54552 [PSH, ACK] Seq=1 Ack=6378 Win=128 Len=19133
412 2019-01-15 14:45:05.051992 192.168.xxx.xx 192.168.xxx.xx TCP 56 54552 → 443 [ACK] Seq=6378 Ack=19134 Win=827 Len=0
24683 2019-01-15 14:56:14.371966 192.168.xxx.xx 192.168.xxx.xx TCP 336 54552 → 443 [PSH, ACK] Seq=6378 Ack=19134 Win=850 Len=280
24684 2019-01-15 14:56:14.372086 192.168.xxx.xx 192.168.xxx.xx TCP 798 54552 → 443 [PSH, ACK] Seq=6658 Ack=19134 Win=850 Len=742
24700 2019-01-15 14:56:14.574974 192.168.xxx.xx 192.168.xxx.xx TCP 1078 [TCP Retransmission] 54552 → 443 [PSH, ACK] Seq=6378 Ack=19134 Win=850 Len=1022
- And on the ESXi we will see packet capture analysis as below
566 2019-01-15 15:05:07.444059 192.168.xxx.xx 192.168.xxx.xx TCP 54 443 → 54552 [FIN, ACK] Seq=1 Ack=1 Win=128 Len=0
567 2019-01-15 15:05:07.693861 192.168.xxx.xx 192.168.xxx.xx TCP 54 [TCP Retransmission] 443 → 54552 [FIN, ACK] Seq=1 Ack=1 Win=128 Len=0
568 2019-01-15 15:05:07.983860 192.168.xxx.xx 192.168.xxx.xx TCP 54
614 2019-01-15 15:06:52.533881 192.168.xxx.xx 192.168.xxx.xx TCP 54 443 → 54552 [RST, ACK] Seq=2 Ack=1 Win=128 Len=0
Note:The preceding log excerpts are only example.Date,time and environmental variables may vary depending on your environment