How to increase vSphere Clustering Service (vCLS) deployment timeouts for the ESX Agent Manager service
search cancel

How to increase vSphere Clustering Service (vCLS) deployment timeouts for the ESX Agent Manager service

book

Article ID: 323217

calendar_today

Updated On:

Products

VMware vCenter Server

Issue/Introduction

This is an informative article to provide you how to increase timeouts in the event the error above matches, and the network connection between vCenter and ESXi may be slow/degraded.

Symptoms:
In vCenter Server, vSphere Cluster Services (vCLS) deployment tasks repeatedly fail, and clusters appear in a degraded health status in the summary page.

On the ESXi host the OVF is being deployed to, you may see the VPXA agent logging (/var/run/log/vpxa.log) display:

2021-09-25T15:37:31.225Z info vpxa[2099536] [Originator@6876 sub=HttpNfcServer] HandlePost: Created DiskUploadWorker for file: /nfc/520***************************88a005/disk-0.vmdk
2021-09-25T15:37:31.226Z info vpxa[2155687] [Originator@6876 sub=HttpNfcServer opID=SWI-1d64ea1b] [DiskUploadWorker] POST request /nfc/******************************88a005/disk-0.vmdk

....

2021-09-25T15:37:32.587Z error vpxa[2099530] [Originator@6876 sub=HttpNfcServer opID=SWI-681165f1] [DiskUploadWorker] Error reading input: N7Vmacore11IOExceptionE(IO error)


Note: This is only observed in vSphere vCenter Server 7.0 Update 1+ as the vSphere Cluster Services (vCLS) was introduced in 7.0 Update 1.


Environment

VMware vCenter Server 7.0.x

Cause

In some cases, such as Remote Office Branch Office (ROBO) configurations, it is possible that network latency could contribute to the default timeouts configured with EAM on vCenter. If the network connection between your ESXi hosts and vCenter is not stable enough there is a chance that agent VMs may timeout and not correctly deploy on vSphere Clusters.

In events such as above, it is recommended to increase the default timeout parameters for the ESX Agent Manager (EAM) to provide the service enough time to deploy the OVF packages for the vCLS agent VMs.

Resolution

To increase the EAM timeout, perform the following on an SSH session for the vCenter Server appliance:

Stop the vCenter Server ESX Agent Manager service

service-control --stop vmware-eam

Edit /usr/lib/vmware-eam/web/webapps/eam/WEB-INF/classes/eam-server-beans.xml, and locate the section <bean id="uploadTimeoutMonitorFactory"

Change:
<constructor-arg value="60" />

to

<constructor-arg value="1200" />
 

image.png

Locate the line <bean id="httpClientFactory"

Change:
<constructor-arg value="30" />

to

<constructor-arg value="21600" />
 
image.png

Save the changes, and restart EAM

service-control --start vmware-eam

The new timeouts will allow EAM a longer threshold should network connections between vCenter Server and the ESXi cluster not allow the transport of the vCLS OVF to deploy properly.

Note: In some cases, vCLS may have old VMs that did not successfully cleanup. If this is the case, you will need to stop EAM and delete the virtual machines. Once EAM is restarted, vCLS should reprovision the service virtual machines. Do not delete the vCLS folder on vCenter Server.

 
 



Workaround:
see above

Additional Information

vSphere Cluster Services (vCLS) in vSphere 7.0 Update 1 (80472)



Impact/Risks:
There is minimal to no impact on this setting being changed. The only change would be we increase how long it takes for EAM to timeout on deploying an OVF such as vCLS.