ESXi 8.0U3 & Later -- Storage vMotion stays at 33% and then times out with errors -- "Connection closed by remote host, possibly due to timeout." / Migration failed after VM memory precopy
search cancel

ESXi 8.0U3 & Later -- Storage vMotion stays at 33% and then times out with errors -- "Connection closed by remote host, possibly due to timeout." / Migration failed after VM memory precopy

book

Article ID: 395119

calendar_today

Updated On:

Products

VMware vSphere ESXi VMware vSphere ESXi 8.0 VMware vSphere ESX 8.x

Issue/Introduction

Storage vMotion fails with one or both of the following errors: 

  • Failed to read stream keepalive: Connection closed by remote host, possibly due to timeout. Failed to establish transport connection.
  • Migration failed after VM memory precopy. Please check vmkernel log for true error. Timed out waiting for migration data.

 

Failed vMotion is having as Target Datastore one of the following:

  • NVME Over TCP Pure Storage FlashArray
  • NVMe Over FC Pure FlashArray

 

Symptoms - One or more do apply: 

  • Storage vMotion not failed on other Datastore with the same backend storage.
  • Storage vMotion succeeded on the same Datastore on previous installed Build ESXi 8.0U2
  • In the vmware.log of the affected VM, you observe similar messages as the following:

YYYY-MM-DDTHH:MM:SS.XXXZ In(05) worker-3046319 - Disk/File copy started for /vmfs/volumes/<uuid-of-datastore>/<VM name>/<VM name>.vmdk.

YYYY-MM-DDTHH:MM:SS.XXXZ Wa(03) vmx - SVMotion: scsi0:0: Disk transfer rate slow: 0 kB/s over the last 10.00 seconds, copied total 0 MB at 0 kB/s.

YYYY-MM-DDTHH:MM:SS.XXXZ In(05) worker-3046545 - Migrate: Remote Log: Destination waited for 130.23 seconds.

YYYY-MM-DDTHH:MM:SS.XXXZ In(05) vmx - [msg.migrate.fail.source.afterPrecopy] Migration failed after VM memory precopy. Please check vmkernel log for true error.

 
  • In vmkernel.log, you can observe similar message as the following:

    YYYY-MM-DDTHH:MM:SS.XXXZ Wa(180) vmkwarning: cpu39:2104613)WARNING: NVMEIO:2644 command 0x45ba31874cc0 failed: ctlr 262, queue 6, psaCmd 0x45ba38ae5840,status 0x6, opc 0x19, cid 45, nsid 36

opc 0x19:     XCOPY command
status 0x6:   Storage controller returned to XCOPY command, which means GC internal error and is retriable.
 


 

Environment

ESXi 8.0 U3 & later

Cause

The issue is related to Storage Level Replication on PURE Storage. 
 
According to official PURE Storage documentation, XCOPY does not work together with ActiveDR (= Storage Level Replication)
The Storage controller (= vmhba) will return an invalid opcode if the Backend Storage (= PURE Storage) does not support XCOPY.
This will inform ESXi to retry I/O without XCOPY.

 

Resolution

Check with PURE Storage or disable XCOPY as a workaround.

Additional Information

 
 
 
"XCOPY is one of the VAAI primitives that is used for offloading tasks to the storage array.
For example, you can use XCOPY to offload such operations as migration or cloning of virtual machines to the array
instead of consuming vSphere resources to perform these tasks."
Reference Broadcom Docs
 
Disabling XCOPY will result in losing these benefits.