When attempting to perform a bulk migration through VMware HCX, the operation fails with an "i/o timeout" error even though standard vMotion migrations (not through HCX) work correctly. The following symptoms may be observed.
To validate this issue
This issue occurs when the VMkernel adapter used for vMotion is configured with the Default TCP/IP stack instead of the vMotion TCP/IP stack. HCX requires VMkernel adapters designated for vMotion to use the dedicated vMotion TCP/IP stack for proper communication between source and target environments.
The default TCP/IP stack does not provide the necessary isolation and routing configuration for HCX operations, leading to timeouts and connection failures even though regular vMotion operations may work correctly.
To resolve this issue, configure a VMkernel adapter to use the vMotion TCP/IP stack. For detailed instructions, see Place vMotion Traffic on the vMotion TCP/IP Stack of a Host.
The basic steps are:
If the error persists after following these steps, contact Broadcom Support for further assistance.
Please provide the below information when opening a support request with Broadcom for this issue
The vMotion TCP/IP stack is designed specifically for migration traffic and provides several important advantages:
The vMotion TCP/IP stack isolates migration traffic from other network traffic, which helps ensure consistent performance during migrations. This isolation allows the traffic to use a dedicated default gateway and routing table, which is especially important in complex network environments where routing conflicts might otherwise occur.
Additionally, the vMotion TCP/IP stack assigns a separate set of buffers and sockets to migration traffic. This dedicated resource allocation prevents migration operations from competing with other services for network resources, which can improve overall migration reliability and performance.
It is highly recommended to configure vMotion traffic on a separate VLAN in addition to using the vMotion TCP/IP stack. Using a dedicated VLAN for vMotion traffic provides additional isolation at the network level, which enhances both security and performance. This separation ensures that vMotion traffic doesn't compete with other traffic types and helps prevent potential network congestion during migration operations.
When troubleshooting HCX Bulk Migrations issues, consider these key areas:
The Data-Plane Diagnostics tool is essential for identifying connectivity problems between HCX components. Run this tool to verify TCP port 902 connectivity between source and target environments, as this port is critical for successful migrations.
Always verify proper VMkernel adapter configuration on both source and target hosts. All hosts involved in migrations should have VMkernel adapters configured with the vMotion TCP/IP stack rather than the Default stack.
Ensure that firewall rules throughout your network allow port 902 for proper communication between HCX components and ESXi hosts. Network connectivity issues are a common cause of migration failures.