HCX vMotion fails msg.svmotion.disk.copyphase.failed when IX interface is on NSX overlay
search cancel

HCX vMotion fails msg.svmotion.disk.copyphase.failed when IX interface is on NSX overlay

book

Article ID: 440025

calendar_today

Updated On:

Products

VMware HCX

Issue/Introduction

HCX Bulk or RAV migration tasks fail during the vMotion phase. The migration halts, and the system reports that the source side relocate failed.

The HCX Manager UI displays the following errors:

vMotion failed. System Error. Source side error is : Source side relocate failed for the virtual machine. A fatal internal error occurred. msg.svmotion.disk.copyphase.failed: Failed to copy one or more disks. vob.vmotion.stream.check.block.mem.timed.out: VMotionStream timed out while waiting for disk's queue count to drop below the maximum limit.

ESXi vmkernel.log shows zero or low throughput metrics:

XVMotion: 3064: timed out while waiting for disk 0's queue count to drop below the maximum limit of 32768 blocks. VMotion bandwidth in last 1s: 0 bytes/s

Environment

VMware HCX
VMware NSX

Cause

The failure is caused by asymmetric routing combined with a stateful firewall on NSX Tier-0 (T0) Gateways operating in Active/Active High Availability (HA) mode. In an Active/Active topology, egress and ingress traffic may traverse different T0 nodes. Because a stateful firewall expects bi-directional traffic to pass through the same node, it drops return packets that arrive at a node where no connection state exists. This results in the HCX vMotion stream timing out.

Resolution

  1. To confirm that the firewall is dropping packets before applying the fix, run the following command on the NSX Edge CLI:

    get logical-router interface <IX_Interface_UUID> stats

    If the Firewall counter under RX-Drops is incrementing while a migration is active, it confirms that the stateful firewall is dropping the asymmetric return packets Interpreting NSX Edge Interface stats.

  2. After confirming the topology and firewall drops as a workaround, disable the stateful firewall on the NSX Tier-0 Gateway to accommodate the Active/Active routing topology. For detailed supported topologies regarding T0 Active/Active HA and stateful firewalls, see Intermittent packet drops with Active/Active T0 and Stateful Firewall Rules.