vcls fail to deploy on cluster, repeat and constant tasks that fail with "The task was cancelled by a user"
search cancel

vcls fail to deploy on cluster, repeat and constant tasks that fail with "The task was cancelled by a user"

book

Article ID: 393189

calendar_today

Updated On:

Products

VMware vCenter Server

Issue/Introduction

When exiting retreat mode on cluster vcls deploy tasks start and fail instantly.

Task Name
 Deploy OVF template
Status
 The task was canceled by a user.

/var/log/vmware/eam.log on vCenter contains the following:

YYYY-MM-DDTHH:mm:SS | ERROR | cluster-agent-1 | AuditedJob.java | 106 | JOB FAILED: [#######] DeployVmJob(ClusterAgent(ID: 'Agent:********-****-****-****-************:null'))
com.vmware.eam.job.DeployVmJob$DeployVmJobFailure: Can't provision VM for ClusterAgent(ID: 'Agent:********-****-****-****-************:null') due to lack of suitable datastore.

The eam.log also shows which ESXi host was used to try to upload the files:

YYYY-MM-DDTHH:mm:SS |  INFO | cluster-agent-4 | PushFiles.java | 87 | [NFC.File.Upload:###########] Pushing VM files through VmPushLease(id: session[########-####-####-####-##########]########-####-####-####-##########, Devices[(key: /vCLS-########-####-####-####-##########/ParaVirtualSCSIController0:0, url: https://ESXi_HOST_FQDN/nfc/########-####-####-####-##########/disk-#.vmdk, thumbprint: CERTIFICATE_THUMBRPRINT), (key: /vCLS-########-####-####-####-##########/nvram, url: https://ESXi_HOST_FQDN/nfc/########-####-####-####-##########/disk-#.nvram, thumbprint: CERTIFICATE_THUMBRPRINT)], HostMap[Datastore(key: https://ESXi_HOST_FQDN/nfc/########-####-####-####-##########/ , hosts[(url: https://ESXi_HOST_FQDN/nfc/########-####-####-####-##########/ , thumbprint: CERTIFICATE_THUMBRPRINT)])])

The following appears in vpxa.log on the ESXi host mentioned in the snippet above:

YYYY-MM-DDTHH:mm:SS error vpxa[57866125] [Originator@6876 sub=Default opID=***-******] Error on read, error: -1
YYYY-MM-DDTHH:mm:SS error vpxa[57866125] [Originator@6876 sub=HttpNfcServer opID=***-********] [DiskUploadWorker] Error reading input: N7Vmacore11IOExceptionE(Error on read, error: -1)

 

The size of the vmdk file may also not match up with what is expected from the ovf file:

ls -la /storage/lifecycle/vmware-hdcs/*.vmdk
-rw-r--r-- 1 root root 143143936 /storage/lifecycle/vmware-hdcs/photon-ova-#####.vmdk

grep size /storage/lifecycle/vmware-hdcs/*.ovf
    <ovf:File ovf:href="photon-ova-#####.vmdk" ovf:id="file0" ovf:size="143143936"/>

In this example, the file photon-ova-#####.vmdk is 143143936 in size which matches what is expected from the ovf contents. If these don't match, that is an indication that the vmdk file is incomplete/corrupt.

Environment

vCenter 7x

Cause

This is caused by a corrupt vmdk file in the OVA.

Resolution

The solution is to copy the vcls ova files from a healthy vCenter of same version & build.

  1. ssh & winscp to vCenter
  2. copy all the files from /storage/lifecycle/vmware-hdcs to a backup folder created in /storage/core:
    mkdir /storage/core/vcls_backup
    cd /storage/lifecycle/vmware-hdcs
    mv * /storage/core/vcls_backup
  3. Using winscp copy the ova files from a healthy vCenter to the problematic one
  4. reboot vCenter
  5. exit retreat mode

Alternatively, restore the vCenter from a backup which was taken before the corruption occurred.

If the same problem is seen with other OVAs, please contact the vendor.

Similar log markers can be seen when a network issue is preventing the upload of the files over port 902 to the ESXi host. For assistance with troubleshooting this, please see Troubleshooting Network File Copy(NFC) issues during clone and xvMotion