ESXi hosts disconnect intermittently from vCenter and reconnect after few seconds
search cancel

ESXi hosts disconnect intermittently from vCenter and reconnect after few seconds

book

Article ID: 317813

calendar_today

Updated On:

Products

VMware vCenter Server VMware vSphere ESXi

Issue/Introduction

  • ESXi hosts disconnect intermittently from vCenter and reconnect  after few seconds

  • From /var/log/vmware/vpxd/vpxd.log the hosts which are being marked as having no response
YYYY-MM-DD warning vpxd[7F4E72BDE700] [Originator@6876 sub=MoHost opID=k###51ol-j42-h5:70###2-b5-03-1###343e] [HostMo] host connection state changed to [NO_RESPONSE] for host-###
  • From /var/log/vmware/vpxd/vpxd.log  the Synchronizing tasks are not getting completed causing host to disconnect.
YYYY-MM-DD verbose vpxd[7F79218B1700] [Originator@6876 sub=vpxLro opID=HB-host-486@8##5-48##45] [VpxLRO] -- BEGIN lro-4750 -- -- VpxdInvtHostSyncHostLRO.Synchronize --
YYYY-MM-DD verbose vpxd[7F79218B1700] [Originator@6876 sub=InvtHostCnx opID=HB-host-86@8##5-48##45] Synchronizing host; [vim.HostSystem:host-###,##.##.##.##]
YYYY-MM-DD trivia vpxd[7F7922A54700] [Originator@6876 sub=HttpConnectionPool-000025] [PopPendingConnection] No pending connections to <cs p:00007f792812fe00, TCP:##.##..##.##:443>
YYYY-MM-DD trivia vpxd[7F79218B1700] [Originator@6876 sub=HttpConnectionPool-000000 opID=HB-host-4##@8##5-48##45] [PopPendingConnection] No pending connections to <cs p:00007f792812fe00, TCP:##.##.##.##:443>
YYYY-MM-DD verbose vpxd[7F79218B1700] [Originator@6876 sub=Vmomi opID=HB-host-4##@8##5-4###45] [ClientAdapterBase::InvokeOnSoap] Invoke done (##.##.##.##, vpxapi.VpxaService.retrieveChanges)
YYYY-MM-DD trivia vpxd[7F79218B1700] [Originator@6876 sub=HttpConnectionPool-000000 opID=HB-host-4##@8725-4##c45] [PopPendingConnection] No pending connections to <cs p:00007f792812fe00, TCP:##.##.##.##:443>
YYYY-MM-DD verbose vpxd[7F79218B1700] [Originator@6876 sub=Vmomi opID=HB-host-486@8725-487c45] [ClientAdapterBase::InvokeOnSoap] Invoke done (##.##.##.##, vmodl.query.PropertyCollector.createFilter)
YYYY-MM-DD trivia vpxd[7F79218B1700] [Originator@6876 sub=VpxProfiler opID=HB-host-486@8725-487c45] Ctr: TrackInBackgroundTime = 0 ms
YYYY-MM-DD trivia vpxd[7F7922C58700] [Originator@6876 sub=SoapAdapter.HTTPService] HTTP Response: Setting Content-Length: -1
 
  • From /var/run/log/hostd.log around the same time for same host.
YYYY-MM-DD verbose vpxa[4F925B70] [Originator@6876 sub=PropertyProvider opID=HB-host--4##@8##5-4###45] RecordOp ASSIGN: info.reason, session[########-####-####-####-########7e57]########-####-####-####-########f8c1. Applied change to temp map.
YYYY-MM-DD verbose vpxa[4F925B70] [Originator@6876 sub=PropertyProvider opID=HB-host--4##@8##5-4###45] [CommitChangesAndNotify] Updating cached values
YYYY-MM-DD verbose vpxa[4F967B70] [Originator@6876 sub=PropertyProvider] RecordOp ASSIGN: info.startTime, session[########-####-####-####-########7e57]########-####-####-####-########f8c1. Applied change to temp map.
YYYY-MM-DD verbose vpxa[4F967B70] [Originator@6876 sub=PropertyProvider] [CommitChangesAndNotify] Updating cached values
YYYY-MM-DD verbose vpxa[4F967B70] [Originator@6876 sub=PropertyProvider opID=HB-host--4##@8##5-4###45] RecordOp ASSIGN: info.state, session[########-####-####-####-########7e57]########-####-####-####-########f8c1. Applied change to temp map.
YYYY-MM-DD verbose vpxa[4F967B70] [Originator@6876 sub=PropertyProvider opID=HB-host--4##@8##5-4###45] RecordOp ASSIGN: info.cancelable, session[########-####-####-####-########7e57]########-####-####-####-########f8c1. Applied change to temp map.
YYYY-MM-DD verbose vpxa[4F967B70] [Originator@6876 sub=PropertyProvider opID=HB-host--4##@8##5-4###45] [CommitChangesAndNotify] Updating cached values
YYYY-MM-DD info    vpxa[4F967B70] [Originator@6876 sub=vpxLro opID=HB-host--4##@8##5-4###45] [VpxLRO] -- BEGIN session[########-####-####-####-########7e57]########-####-####-####-########f8c1 -- vpxa -- vpxapi.VpxaService.retrieveChanges -- ########-####-####-####-########7e57
YYYY-MM-DD verbose vpxa[4F967B70] [Originator@6876 sub=VpxProfiler opID=HB-host--4##@8##5-4###45] [1+] VpxaMoService::GetChanges
YYYY-MM-DD verbose vpxa[4F967B70] [Originator@6876 sub=VpxaHalCnxHostagent opID=HB-host--4##@8##5-4###45] [VpxaHalCnxHostagent::DoCheckForUpdates] CheckForUpdates() done; version sent: 315435, version received: (no update)
  • VPXD crashing, for all the host whenever there is a disconnect.

  • In the /var/log/vmware/vpxd/vpxd.log we see similar to below
YYYY-MM-DD error vpxd[7F7968478700] [Originator@6876 sub=Vmomi opID=HB-host-4##@8##5-4###45-TaskLoop-28887855] Got vmacore exception: N7Vmacore15SystemExceptionE(Connection timed out)
--> [context]zKq7AVECAAAA####gAMdnB4ZAAAeF4rbG####hY29yZS5zbwAA#######DIHysAcRUjAOv5JQ######6vIwA3tysBV########y4wAAIdjA5sa#####vLjYA[/context]
YYYY-MM-DD verbose vpxd[7F7968478700] [Originator@6876 sub=TaskInfo opID=HB-host-486@8715-6a3c7a68-TaskLoop-28887855] [VpxdTaskInfoPublisher] host:[host-###] WaitForUpdates failed: vmodl.fault.HostCommunication
--> [backtrace begin] product: VMware VirtualCenter, version: 6.5.0, build: build-7515524, tag: vpxd, cpu: x86_64, os: linux, buildType: release
--> backtrace[00] libvmacore.so[0x002B5E78]: Vmacore::System::Stacktrace::CaptureWork(unsigned int)
--> backtrace[01] libvmacore.so[0x001B1710]: Vmacore::System::SystemFactoryImpl::CreateQuickBacktrace(Vmacore::Ref<Vmacore::System::Backtrace>&)
--> backtrace[02] libvmacore.so[0x001869CA]: Vmacore::Throwable::Throwable(std::string const&)
--> backtrace[03] vpxd[0x0054D40E]
--> backtrace[04] vpxd[0x00622866]
--> backtrace[05] vpxd[0x00628FBC]
--> backtrace[06] vpxd[0x00622F65]
--> backtrace[07] vpxd[0x00626877]
--> backtrace[08] libvmomi.so[0x0012623C]: Vmomi::StubImpl::_Invoke_Task(Vmomi::ManagedMethod*, std::vector<Vmacore::Ref<Vmomi::Any>, std::allocator<Vmacore::Ref<Vmomi::Any> > >&, Vmacore::Ref<Vmomi::Any>&)
--> backtrace[09] libvmomi.so[0x00199CBC]: Vmomi::Core::PropertyCollectorStub::WaitForUpdates(Vmacore::Optional<std::string> const&, Vmacore::Ref<Vmomi::Core::PropertyCollector::UpdateSet>&)
--> backtrace[10] vpxd[0x0093BADC]
--> backtrace[11] vpxd[0x0093CD0C]
--> backtrace[12] vpxd[0x00542747]
--> backtrace[13] vpxd[0x005427E0]
--> backtrace[14] libvmacore.so[0x0023AC47]
--> backtrace[15] libvmacore.so[0x0023AF2E]
--> backtrace[16] libvmacore.so[0x002BB737]
--> backtrace[17] libpthread.so.0[0x00007454]
--> backtrace[18] libc.so.6[0x000E8C1D]
--> [backtrace end]
  • rhttp packet capture analysis shows packet drops. 

  • The captured packets at the vCenter side shows similar to below
323 YYYY-MM-DD ##.##.##.## ##.##.##.## TCP 337 54552 → 443 [PSH, ACK] Seq=1 Ack=1 Win=850 Len=281
341 YYYY-MM-DD ##.##.##.## ##.##.##.## TCP 6152 54552 → 443 [PSH, ACK] Seq=282 Ack=1 Win=850 Len=6096
352 YYYY-MM-DD ##.##.##.## ##.##.##.## TCP 62 443 → 54552 [ACK] Seq=1 Ack=6378 Win=116 Len=0
353 YYYY-MM-DD ##.##.##.## ##.##.##.## TCP 62 [TCP Window Update] 443 → 54552 [ACK] Seq=1 Ack=6378 Win=128 Len=0
411 YYYY-MM-DD ##.##.##.## ##.##.##.## TCP 19189 443 → 54552 [PSH, ACK] Seq=1 Ack=6378 Win=128 Len=19133
412 YYYY-MM-DD ##.##.##.## ##.##.##.## TCP 56 54552 → 443 [ACK] Seq=6378 Ack=19134 Win=827 Len=0
24683 YYYY-MM-DD ##.##.##.## ##.##.##.## TCP 336 54552 → 443 [PSH, ACK] Seq=6378 Ack=19134 Win=850 Len=280
24684 YYYY-MM-DD ##.##.##.## ##.##.##.## TCP 798 54552 → 443 [PSH, ACK] Seq=6658 Ack=19134 Win=850 Len=742
24700 YYYY-MM-DD ##.##.##.## ##.##.##.## TCP 1078 [TCP Retransmission] 54552 → 443 [PSH, ACK] Seq=6378 Ack=19134 Win=850 Len=1022
 
  • On the ESXi the packet capture analysis shown as below

566 YYYY-MM-DD ##.##.##.## ##.##.##.## TCP 54 443 → 54552 [FIN, ACK] Seq=1 Ack=1 Win=128 Len=0
567 YYYY-MM-DD ##.##.##.## ##.##.##.## TCP 54 [TCP Retransmission] 443 → 54552 [FIN, ACK] Seq=1 Ack=1 Win=128 Len=0
568 YYYY-MM-DD ##.##.##.## ##.##.##.## TCP 54
614 YYYY-MM-DD ##.##.##.## ##.##.##.## TCP 54 443 → 54552 [RST, ACK] Seq=2 Ack=1 Win=128 Len=0

 



Environment

VMware vSphere ESXi 6.5
VMware vCenter Server 6.5.x
VMware vCenter Server Appliance 6.5.x
VMware vSphere ESXi 6.7
VMware vCenter Server 6.7.x
VMware vCenter Server Appliance 6.7.x
VMware vCenter Server 7.0.x

Cause

It issue caused by packet drop between the vCenter and hosts. The re-transmissions on both sides are clear indication that the connection timed out because of lost network connectivity - both sides were sending packets that the other side didn't receive. This fix enables sending TCP keep-alive packets when the connection is idle.

This can also happen when the vCenter is under very high load and constrained on resources

Resolution

This issue is resolved in:
  • vCenter Server 6.5 U3f build 15259038.
  • ESXi 6.5  Patch Release ESXi650-201912002.
  • vCenter 6.7 U3m build 17713310.
  • ESXi 6.7 P05 ESXi670-20210300.


Workaround:
To work around this issue, please add the lines on the below files:
  • In the config.xml on ESXi host located at /etc/vmware/rhttpproxy/
NOTE: Take a backup of this file before modifying.


    <config>
   <vmacore>
//some lines of other options ......
      <tcpKeepAlive>
         <serverSocket>
            <isEnabled>true</isEnabled>
            <idleTimeSec>180</idleTimeSec>
            <probeTimeSec>180</probeTimeSec>
            <probeCount>8</probeCount>
         </serverSocket>
         <clientSocket>
            <isEnabled>true</isEnabled>
            <idleTimeSec>180</idleTimeSec>
            <probeTimeSec>180</probeTimeSec>
            <probeCount>8</probeCount>
         </clientSocket>
      </tcpKeepAlive>
//some lines of other options......
   </vmacore>
</config>


Additional Information

Refer the below reference links for more information:

Collecting network packets using the lightweight PacketCapture

Using the pktcap-uw tool in ESXi 5.5 and later

ESXi host disconnects intermittently from vCenter Server