[HCX] Bulk migration getting stuck at 0% base sync when retrying after failed migration

search cancel

[HCX] Bulk migration getting stuck at 0% base sync when retrying after failed migration

book

Article ID: 305316

calendar_today

Updated On:

Products

VMware Cloud on AWS VMware Cloud on Dell EMC VMware HCX

Issue/Introduction

Bulk Migration stuck at 0% base sync. The /common/logs/admin/app.log shows NFC_FILE_LOCKED errors.

NFC Error message as seen in the app.log in the cloud logs:

20##-##-## ##:##:##.642 UTC [ReplicationTransferService_SvcThread-58706, Ent: DEFAULT, , TxId: TxId: a9ba3104-####-####-####-####a1f3####] WARN c.v.v.h.a.hbr.HbrServerInstance- Error returned by replicaGroup{"@xsi:type":"LocalizedMethodFault","fault" :{"@xsi:type":"HbrReplicaFaultStorageLocked","datastoreUUID":"vsan:########31c494bce-####bfdbcd####","pathname":"VM_1661452238089\/VM.vmdk"},"localizedMessage":"Error for (datastoreUUID: \"vsan:#####f1931c4####-####bfdbcd####\"), (diskId: \"RDID-2817####-e139-4a9a-####-a051cf0#####\"), (hostId: \"host-####\"), (pathname: \"VM_1661452#####\/VM.vmdk\"), (flags: on-disk-open, nfc-error): Class: NFC Code: 13; NFC error: NFC_FILE_LOCKED; Code set to: Storage is locked.; Set error flag: nfc-error; Can't open remote disk \/vmfs\/volumes\/vsan:#####f1931c4####-####bfdbcd####\/VM_16614522####\/VM.vmdk; Set error flag: on-disk-open; Failed to open replica (\/vmfs\/volumes\/vsan:#####f1931c4####-####bfdbcd####\/VM_1661452#####\/VM.vmdk); Failed to open activeDisk (GroupID=VRID-#####a99-e056-####-####-######7037ed) (DiskID=RDID-######2b-####-####-####-######0556d1); Can't create replica state (GroupID=VRID-########-####-####-####-######7037ed) (DiskID=RDID-######2b-####-####-####-######0556d1); Cannot activate group. Loading disks from database (GroupID=VRID-######99-####-####-####-######7037ed) ; Connecting to group VRID-######99-####-####-####-######7037ed"}

Environment

VMware HCX

Cause

This is due to a previous failed bulk migration that left behind stale entries.

Resolution

Based on the NFC_FILE_LOCKED error, this could happen due to stale entries from the previous failed migration attempts. The new migrations picks up the same VM folder and job reports lock error on the existing files. The recommendation would be to cancel & perform a cleanup of the migrations.

Ensure that there are no folders on the target site of the migrations. If the target folder fails to delete due to a lock error it may require to identify the host reporting lock and restarting the hostd services or reboot the host which may require a maintenance window.

Note: In some cases, stale migration entries may need to be cleaned up from the HCX database. Please open a support request for assistance with this - Creating and managing Broadcom support cases

Feedback

thumb_up Yes

thumb_down No