Powered-off vMotion (relocate) of a large virtual machine across datastores may timeout in vCenter
search cancel

Powered-off vMotion (relocate) of a large virtual machine across datastores may timeout in vCenter

book

Article ID: 318734

calendar_today

Updated On:

Products

VMware vCenter Server

Issue/Introduction

In some cases, based on the scenario at hand, the NFC copy operation may take a long time and could hit the upper threshold of some of vCenter Server's timeout settings. This can contribute to relocate operations timing out for large virtual machines.

This article is intended to inform of what settings to change to ensure the relocate operation can complete successfully.

Symptoms:

The following conditions may trigger a timeout operation within vCenter and require adjusting some vCenter Server advanced settings in order to workaround the problem. Some criteria that can trigger this behavior is one or more of the following:

·         Relocating a virtual machine from one ESXi host to another (Powered-Off)

·         In addition to changing ESXi hosts, changing datastore types (in some cases across array types)

·         The virtual machine disk size is larger than 1TB

·         The disk type will change during the migration (ie. Changing from thin provisioned to Eager Zeroed Thick)

 
The error observed in vCenter Tasks will be “Operation timed out”

Within the vCenter Server (vpxd.log) logs, you may see:

2020-04-01T07:42:23.996Z info vpxd[01615] [Originator@6876 sub=vpxTaskInfo] Timed out waiting for task vim.Task:task-5779
2020-04-01T07:42:23.996Z warning vpxd[01615] [Originator@6876 sub=vpxLro opID=sroger5str-279239-auto-5zgo-h5:70021205-32-02] [VpxLRO] Timeout waiting on updates for task-5779
2020-04-01T07:42:23.997Z warning vpxd[01615] [Originator@6876 sub=vpxLro opID=sroger5str-279239-auto-5zgo-h5:70021205-32-02] [VpxLRO] Timeout waiting on updates for task-5779
2020-04-01T07:42:23.997Z error vpxd[01660] [Originator@6876 sub=VmProv opID=sroger5str-279239-auto-5zgo-h5:70021205-32-02] Get exception while executing action vpx.vmprov.CopyVmFiles: N3Vim5Fault8Timedout9ExceptionE(Fault cause: vim.fault.Timedout--> )

On the source ESXi host, vpxa.log would note the disk copy starting successfully:

2020-04-01T03:44:09.361Z verbose vpxa[2099607] [Originator@6876 sub=NfcManager opID=sroger5str-210840-auto-4iop-h5:70017269-5-02-5c] Transfer file /vmfs/volumes/5505ede1-3c37e4a8-c061-0025b5020430/monster_VM/monster_VM_1.vmdk -> /vmfs/volumes/5e83a485-8
b447362-3181-0025b5a201f8/monster_VM/monster_VM_1.vmdk (parent: , size: 1073741824000, flags: 4259912, grainSize: 0 policy: '(null)')

Approximately 30 minutes later, the destination ESXi host logs (hostd.log) would mark the operation failed and vCenter would log the timeout:

hostd.0:2020-04-01T04:23:59.783Z warning hostd[2099211] [Originator@6876 sub=Libs opID=sroger5str-210840-auto-4iop-h5:70017269-5-02-f6-7c8e user=vpxuser:VMWARE\jsmith] Hostlog_ExecuteCleanup: Failed to stat disk /vmfs/volumes/5e83a485-8b447362-3181-0025b5a201f8/monster_VM/monster_VM_1.vmdk - not cleaning up. Hlog: /vmfs/volumes/5e83a485-8b447362-3181-0025b5a201f8/monster_VM/monster_VM-1e67cef9.hlog


Environment

VMware vCenter Server 6.5.x
VMware vCenter Server 6.7.x
VMware vCenter Server 7.0.x

Cause

This is caused due to the default timeout for vCenter Server for the following advanced settings:

config.task.timeout                                    

config.vmomi.soapStubAdapter.blockingTimeoutSeconds 

 The timeout is hit in specific cases where:

·         A large virtual machine disk size is being copied – upwards of 1TB or larger

·         The relocate operation leverages NFC as the virtual machine is powered off and copying across two different VMFS datastores (in some cases different arrays)

·         The source and destination ESXi host management networks are in different broadcast domains

·         The disk type for the virtual machine is changed during the copy (ie. Converting from thin to thick provisioned VMDK’s)

Resolution

If relocating the virtual machine (leveraging NFC) is the only method available, the following advanced settings would need to be added in vCenter:

Name: config.task.timeout          Value: 7200

Name: config.vmomi.soapStubAdapter.blockingTimeoutSeconds               Value: 18000 

Note: The advanced settings do not exist in vCenter and must be added. The default value is 1800 seconds (30min) for both. VMware recommends that you change these settings only when instructed to do so by VMware technical support or when you are following specific instructions in VMware documentation.

A restart of vCenter Server is required for the settings to take effect.


Additional Information

For VMware documentation on vCenter Server advanced settings, and how to insert custom settings, please refer to the documentation below:
 

https://docs.vmware.com/en/VMware-vSphere/6.5/com.vmware.vsphere.vcenterhost.doc/GUID-62184858-32E6-4DAC-A700-2C80667C3558.html

https://docs.vmware.com/en/VMware-vSphere/6.7/com.vmware.vsphere.vcenterhost.doc/GUID-62184858-32E6-4DAC-A700-2C80667C3558.html


Impact/Risks:
A restart of the vCenter Server process (vpxd) will be required in order for the resolution instructions to take effect.