/var/log/vmware/vpxd/
, show repeated messages "Certificate not available, starting hostsync
"[YYYY-MM-DDTHH:MM:SS] info vpxd[34753] [Originator@6876 sub=Vmomi opID=HostSync-host-147-79013b58] Creating SOAP stub adapter for /vpxa on <ESXihostfqdn>:443
[YYYY-MM-DDTHH:MM:SS] info vpxd[34753] [Originator@6876 sub=IO.Http opID=HostSync-host-147-79013b58] Terminating user agent request; (null), N7Vmacore16TimeoutExceptionE(Operation timed out)
[YYYY-MM-DDTHH:MM:SS] info vpxd[34753] [Originator@6876 sub=IO.Http opID=HostSync-host-147-79013b58] Set user agent error; state: 1, (null), N7Vmacore16TimeoutExceptionE(Operation timed out)
[YYYY-MM-DDTHH:MM:SS] warning vpxd[34781] [Originator@6876 sub=IO.Connection opID=HostSync-host-XXX-79013b58] Failed to connect; <io_obj p:0x00007fc56ca86dc8, h:-1, <TCP 'xxx.xxx.xxx.xxx : 45208'>, <TCP '<IP> : 443'>>, e: 104(Connection reset by peer), duration: 120020msec
[YYYY-MM-DDTHH:MM:SS] warning vpxd[34753] [Originator@6876 sub=Vmomi opID=HostSync-host-147-79013b58] Creating SOAP stub adapter failed: N7Vmacore24InvalidArgumentExceptionE(No version for VMODL calls to <<cs p:00007fc5a8348eb0, TCP:<ESXihostfqdn>:443>, /vpxa>)
[YYYY-MM-DDTHH:MM:SS] info vpxd[23864] [Originator@6876 sub=Vmomi opID=HostSync-host-147-2f8fc66f] Creating SOAP stub adapter for /vpxa on <ESXihostfqdn>:443
[YYYY-MM-DDTHH:MM:SS] info vpxd[09943] [Originator@6876 sub=Vmomi opID=sps-Main-416743-850-449976-4] Creating SOAP stub adapter for /vpxa on <hostname>
c.com:443
[YYYY-MM-DDTHH:MM:SS] error vpxd[09943] [Originator@6876 sub=Vmomi opID=sps-Main-416743-850-449976-4] [VpxaClientAdapter] Last login to <Hostname>
failed; timeout; no host sync yet
[YYYY-MM-DDTHH:MM:SS] error vpxd[09943] [Originator@6876 sub=Vmomi opID=sps-Main-416743-850-449976-4] Cannot contact the specified host (<hostname)>
[YYYY-MM-DDTHH:MM:SS] info vpxd[15009] [Originator@6876 sub=Vmomi opID=FdmMonitor-domain-c261957-2919bc76] Creating SOAP stub adapter for /fdm on <hostname>:443
[YYYY-MM-DDTHH:MM:SS] warning vpxd[15009] [Originator@6876 sub=Vmomi opID=FdmMonitor-domain-c261957-2919bc76] [FdmClientAdapter] Login request throttled on host <hostname>
[YYYY-MM-DDTHH:MM:SS] error vpxd[15009] [Originator@6876 sub=Vmomi opID=FdmMonitor-domain-c261957-2919bc76] Cannot contact the specified host (<hostname>)
[YYYY-MM-DDTHH:MM:SS] info vpxd[15009] [Originator@6876 sub=Vmomi opID=FdmMonitor-domain-c261957-2919bc76] Creating SOAP stub adapter for /fdm on <hostname>:443
[YYYY-MM-DDTHH:MM:SS] warning vpxd[15009] [Originator@6876 sub=Vmomi opID=FdmMonitor-domain-c261957-2919bc76] [FdmClientAdapter] Login request throttled on host <hostname>
[YYYY-MM-DDTHH:MM:SS] error vpxd[15009] [Originator@6876 sub=Vmomi opID=FdmMonitor-domain-c261957-2919bc76] Cannot contact the specified host (<hostname>)
[YYYY-MM-DDTHH:MM:SS] warning vpxd[07057] [Originator@6876 sub=InvtHostCnx opID=HostSync-host-<moid>-2f729613] Exception occurred during host sync; Got method fault; [vim.HostSystem:host-<moid>,<hostname>], e: N3Vim5Fault12InvalidLogin9ExceptionE(Fault cause: vim.fault.InvalidLogin
--> )
Usually, this is caused by a lock which is blocking heartbeats. Some other common causes for this issue can be:
If the password of ESX user "vpxuser" is out of sync with the password maintained by vpxd, which can happen in various scenarios such as:
VCSA has been restored from a snapshot, and "vpxuser" password has been updated between the time snapshot was taken and restore time.
The password of ESX user "vpxuser" has expired on ESX due to ESX updated setting 'Security.PasswordMaxDays' (default is "never expire" but the user may update that advanced setting)
The ESX host has been added to another VC without having been removed from the first VC (the first VC will fail to login to ESX in that case)
Due to mis-configured network connectivity, which causes connection failure from vpxd to ESX host after ~120 seconds timeout.
Any other underlying issue causing ESXi host not responding, typically related to storage/network loss (e.g SD card as described in the KB: 318794)
This issue is fixed in VMware vCenter Server 7.0 U3i and VMware vCenter Server 8.0a releases.
To workaround this issue:
To identify if this is the issue you are encountering, look at the vpxd logs for one of the not responding hosts that you have attempted to reconnect. Find the reconnect task:
[YYYY-MM-DDTHH:MM:SS] info vpxd[36085] [Originator@6876 sub=vpxLro opID=l6ejabx0-47722379-auto-seuto-h5:72353065-61] [VpxLRO] -- BEGIN task-5224518 -- host-1607205 -- vim.HostSystem.reconnect -- xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx(xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx)
Using the opID, find where this task finished and notice there was no error with the task:
[YYYY-MM-DDTHH:MM:SS] info vpxd[36085] [Originator@6876 sub=vpxLro opID=l6ejabx0-47722379-auto-seuto-h5:72353065-61] [VpxLRO] -- FINISH task-5224518
Next, use the host ID to find the host sync task in the logs:
[YYYY-MM-DDTHH:MM:SS] verbose vpxd[36097] [Originator@6876 sub=InvtHostCnx opID=l6ejabx0-47722570-auto-seuz0-h5:72353107-1a] Queuing host sync; [vim.HostSystem:host-1607205,esxi.domain.com]
[YYYY-MM-DDTHH:MM:SS] verbose vpxd[59454] [Originator@6876 sub=vpxLro opID=l6ejabx0-47722570-auto-seuz0-h5:72353107-1a-51c0b834] [VpxLRO] -- BEGIN lro-1301294 -- -- VpxdInvtHostSyncHostLRO.Synchronize --
Immediately after the host sync is started, we see a failure stating that the connection is not alive due to missing heartbeats:
[YYYY-MM-DDTHH:MM:SS] warning vpxd[59454] [Originator@6876 sub=InvtHostCnx opID=l6ejabx0-47722570-auto-seuz0-h5:72353107-1a-51c0b834] Connection not alive due to missing heartbeats; [vim.HostSystem:host-1607205,esxi.domain.com], cnx: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
[YYYY-MM-DDTHH:MM:SS] warning vpxd[59454] [Originator@6876 sub=MoHost opID=l6ejabx0-47722570-auto-seuz0-h5:72353107-1a-51c0b834] host [vim.HostSystem:host-1607205,esxi.domain.com] connection state changed to NO_RESPONSE
[YYYY-MM-DDTHH:MM:SS] warning vpxd[59454] [Originator@6876 sub=InvtHostCnx opID=l6ejabx0-47722570-auto-seuz0-h5:72353107-1a-51c0b834] Exception occurred during host sync; Host communication failed; [vim.HostSystem:host-1607205,esxi.domain.com], e: N5Vmomi5Fault17HostCommunication9ExceptionE(Fault cause: vmodl.fault.HostCommunication
Note that the time between when the reconnect is started and the failure is less than 30 seconds in these logs. It may be longer, but we should not be marking a host as unresponsive for missing heartbeats in this short of a time span.
If you had run a packet capture looking for network problems, you may notice that the host is reporting a TCP Window Full Message like below:
3685 19:08:08.897786 YYYY-MM-DD HH:MM:SS <Host IP> <vCenter IP> TCP 1602 [TCP Window Full] 443 → 50118 [ACK] Seq=3191140846 Ack=3569770941 Win=66560 Len=1536 TSval=3535247282 TSecr=2844746348 [TCP segment of a reassembled PDU]
3688 19:08:08.899182 YYYY-MM-DD HH:MM:SS <vCenter IP> <Host IP> TCP 66 [TCP ZeroWindow] 50118 → 443 [ACK] Seq=3569770941 Ack=3191202286 Win=0 Len=0 TSval=2844746349 TSecr=353524728
You may also be able to use the following command run against the vpxd logs to identify a set of hosts to investigate for this issue. Note that you must change the <VPXD LOGS> instances in the sample to the log or logs you want to review (replace <VPXD LOGS> with the name of the specific log or a regex for running the command on multiple logs):
$ grep "Terminating user agent request; SSL" <VPXD LOGS> | grep HostSync | cut -d "=" -f3 | cut -d "]" -f1 | sort | uniq | while read line; do grep $line <VPXD LOGS> | egrep "Creating SOAP stub adapter for \/vpxa|Terminating user agent request; SSL"; done
or
$ zgrep "Terminating user agent request; SSL" vpxd-*.log.gz | grep HostSync | cut -d "=" -f3 | cut -d "]" -f1 | sort | uniq | while read line; do zgrep $line vpxd-*.log.gz | egrep "Creating SOAP stub adapter for \/vpxa|Terminating user agent request; SSL"; done