Clone operation using a Template to VM or a VM to VM would fail with error "An error occurred while communicating with remote host" with Tegile Storage on ESXi 6.7
search cancel

Clone operation using a Template to VM or a VM to VM would fail with error "An error occurred while communicating with remote host" with Tegile Storage on ESXi 6.7

book

Article ID: 324314

calendar_today

Updated On:

Products

VMware vCenter Server VMware vSphere ESXi

Issue/Introduction

This article provides the information on why the clone operation fails because of the tegile vaail plugin (version 1.0-15.53)

Symptoms:
1) Storage vMotion of a VM from Tegile storage to Tegile storage would fail with error "An error occurred while communicating with remote host"
2) Storage vMotion of a VM from Tegile storage to local datastore would complete and vice versa would also complete successfully.
3) Clone of a VM from Tegile storage to Tegile storage would fail with error "An error occurred while communicating with remote host"
4) Clone of a VM from Tegile storage to local datastore would complete and vice versa would also complete successfully.
5) Issue would start only after upgrading ESXi to 6.7 as the Tegile plugin version 1.0-15.53 is not compatible with ESXi 6.7


Note: During above failure scenarios the vpxa agent on host would crash with vpxa dump file causing the host to temporarily disconnect from the vCenter, however the host will reconnect automatically.

Environment

VMware vSphere ESXi 6.7
VMware vCenter Server 6.7.x

Cause

The tegile vaai plugin in the ESXi causes the vpxa to crash, which disconnects the ESXi from vCenter momentarily resulting the clone operation to fail.

[root@ESXi4:~] esxcli software vib list | grep tgl
tgl-vaai-nas-diskplugin        1.0-15.53                             Tegile                 VMwareAccepted    2018-09-13

vpxd.log:
2018-09-24T19:17:24.871Z info vpxd[04470] [Originator@6876 sub=VmProv opID=48c9016f-8a4a-4088-92f0-b65de4d5e48a-237620-auto-237621-h5c:70015914-7f-01] Starting Datastore changing Local-VC Clone of poweredOff vm Server2016 on vim.HostSystem:host-90(x.x.x.x) with ds ds:///vmfs/volumes/cafeafb4-369f4b12/ to vim.HostSystem:host-90(x.x.x.x) with ds ds:///vmfs/volumes/fb200226-e484902a/

Above line explains the clone task being initiated with the source and destination datastore details.

2018-09-24T19:18:14.516Z warning vpxd[04470] [Originator@6876 sub=OCM opID=48c9016f-8a4a-4088-92f0-b65de4d5e48a-237620-auto-237621-h5c:70015914-7f-01] Encountered host communication error when attempting to finish operation! Waiting for up to one hour for the host to reconnect.

2018-09-24T19:19:14.613Z info vpxd[04470] [Originator@6876 sub=Default opID=48c9016f-8a4a-4088-92f0-b65de4d5e48a-237620-auto-237621-h5c:70015914-7f-01] [VpxLRO] -- ERROR lro-1938778 --  -- VmprovWorkflow: vmodl.fault.HostCommunication:
--> Result:
--> (vmodl.fault.HostCommunication) {
-->    faultCause = (vmodl.MethodFault) null,
-->    faultMessage = <unset>
-->    msg = ""
--> }
--> Args:
-->

============================

vmkernel.log:
2018-09-24T19:17:25.169Z cpu29:2281985)User: 3143: vpxa-worker: wantCoreDump:vpxa-worker signal:6 exitCode:0 coredump:enabled
2018-09-24T19:17:25.325Z cpu24:2281985)UserDump: 3098: vpxa-worker: Dumping cartel 2281965 (from world 2281985) to file /var/core/vpxa-zdump.003 ...
2018-09-24T19:18:14.447Z cpu17:2281985)UserDump: 3246: vpxa-worker: Userworld(vpxa-worker) coredump complete.

Above lines explains that the vpxa has crashed and created a dump file.

============================


hostd.log:
2018-09-24T19:19:14.547Z info hostd[2188008] [Originator@6876 sub=Libs opID=48c9016f-8a4a-4088-92f0-b65de4d5e48a-237620-auto-237621-h5c:70015914-7f-01-48-f557 user=vpxuser:D91_DOMAIN\BodiEric] OBJLIB-LIB:  Failed to get VCFS root path for '/vmfs/volumes/fb200226-e484902a/LFiche1_Temp': No such file or directory (131076).
2018-09-24T19:19:14.549Z warning hostd[2188008] [Originator@6876 sub=Vmsvc opID=48c9016f-8a4a-4088-92f0-b65de4d5e48a-237620-auto-237621-h5c:70015914-7f-01-48-f557 user=vpxuser:D91_DOMAIN\BodiEric] Could not retrieve canonical path for /vmfs/volumes/fb200226-e484902a/LFiche1_Temp/LFiche1_Temp.vmx
2018-09-24T19:19:14.550Z info hostd[2188008] [Originator@6876 sub=vm opID=48c9016f-8a4a-4088-92f0-b65de4d5e48a-237620-auto-237621-h5c:70015914-7f-01-48-f557 user=vpxuser:D91_DOMAIN\BodiEric] VigorOffline_Init: Failed to initialize VIGOR offline: Cannot open file "/vmfs/volumes/fb200226-e484902a/LFiche1_Temp/LFiche1_Temp.vmx": No such file or directory.
2018-09-24T19:19:14.550Z info hostd[2188008] [Originator@6876 sub=Libs opID=48c9016f-8a4a-4088-92f0-b65de4d5e48a-237620-auto-237621-h5c:70015914-7f-01-48-f557 user=vpxuser:D91_DOMAIN\BodiEric]

2018-09-24T19:19:14.565Z info hostd[2188008] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/fb200226-e484902a/LFiche1_Temp/LFiche1_Temp.vmx opID=48c9016f-8a4a-4088-92f0-b65de4d5e48a-237620-auto-237621-h5c:70015914-7f-01-48-f557 user=vpxuser:D91_DOMAIN\BodiEric] State Transition (VM_STATE_INITIALIZING -> VM_STATE_INVALID_LOAD)

2018-09-24T19:19:14.566Z info hostd[2188008] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/fb200226-e484902a/LFiche1_Temp/LFiche1_Temp.vmx opID=48c9016f-8a4a-4088-92f0-b65de4d5e48a-237620-auto-237621-h5c:70015914-7f-01-48-f557 user=vpxuser:D91_DOMAIN\BodiEric] State Transition (VM_STATE_INVALID_LOAD -> VM_STATE_UNREGISTERING)
2018-09-24T19:19:14.566Z warning hostd[2188008] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/fb200226-e484902a/LFiche1_Temp/LFiche1_Temp.vmx opID=48c9016f-8a4a-4088-92f0-b65de4d5e48a-237620-auto-237621-h5c:70015914-7f-01-48-f557 user=vpxuser:D91_DOMAIN\BodiEric] Null _cachedLayoutEx in force unbind
2018-09-24T19:19:14.566Z info hostd[2188008] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/fb200226-e484902a/LFiche1_Temp/LFiche1_Temp.vmx opID=48c9016f-8a4a-4088-92f0-b65de4d5e48a-237620-auto-237621-h5c:70015914-7f-01-48-f557 user=vpxuser:D91_DOMAIN\BodiEric] State Transition (VM_STATE_UNREGISTERING -> VM_STATE_GONE)
2018-09-24T19:19:14.566Z info hostd[2188008] [Originator@6876 sub=Vmsvc.vm:/vmfs/volumes/fb200226-e484902a/LFiche1_Temp/LFiche1_Temp.vmx opID=48c9016f-8a4a-4088-92f0-b65de4d5e48a-237620-auto-237621-h5c:70015914-7f-01-48-f557 user=vpxuser:D91_DOMAIN\BodiEric] Virtual machine object cleanup

Above lines explain that the clone operation created the respective files, however the path for the vmx file was not available, since it failed to find the path it marked the file as invalid and unregistered, hence we will not be able to see the cloned disks in the destination datastore.


Resolution

Upgrade the tegile vaai plugin to the latest plugin version "1.0-15.70" as per the hardware compatibility guide. 

Ref: https://www.vmware.com/resources/compatibility/detail.php?deviceCategory=san&productid=37941&releaseid=369&deviceCategory=san&details=1&partner=423&pFeaturesCat=9&isSVA=0&page=1&display_interval=10&sortColumn=Partner&sortOrder=Asc


Workaround:
Disable Tegile VAAI plugin.

Ref: https://kb.vmware.com/s/article/1033665

Additional Information

Impact/Risks:
Reboot of ESXi required as the plugin updated.

Attachments

sVmotion_Tegile to Tegile.PNG get_app
clone.png get_app