HCX Bulk Migration Fails Due to VMkernel Using Default TCP/IP Stack
search cancel

HCX Bulk Migration Fails Due to VMkernel Using Default TCP/IP Stack

book

Article ID: 396008

calendar_today

Updated On:

Products

VMware HCX

Issue/Introduction

When attempting to perform a bulk migration through VMware HCX, the operation fails with an "i/o timeout" error even though standard vMotion migrations (not through HCX) work correctly. The following symptoms may be observed.

  • HCX Service Mesh Diagnostics shows TCP 902 connection failures
  • Error in migration logs: "vmkernel stack not configured" when executing vmkping commands
  • Regular vMotion operations (not using HCX) work successfully

To validate this issue

  • Check the VMkernel adapter TCP/IP stack configuration in the ESXi host settings
  • Verify TCP 902 connectivity in the HCX Service Mesh Data-Plane Diagnostics
  • Confirm that non-HCX vMotion migrations work without errors

Environment

  • VMware HCX
  • VMware vSphere ESXi
  • VMware vCenter Server

Cause

This issue occurs when the VMkernel adapter used for vMotion is configured with the Default TCP/IP stack instead of the vMotion TCP/IP stack. HCX requires VMkernel adapters designated for vMotion to use the dedicated vMotion TCP/IP stack for proper communication between source and target environments.

The default TCP/IP stack does not provide the necessary isolation and routing configuration for HCX operations, leading to timeouts and connection failures even though regular vMotion operations may work correctly.

Resolution

To resolve this issue, configure a VMkernel adapter to use the vMotion TCP/IP stack. For detailed instructions, see Place vMotion Traffic on the vMotion TCP/IP Stack of a Host.

The basic steps are:

Remove or Reconfigure Existing VMkernel Adapter

  1. In the vSphere Client, navigate to the host
  2. Click the Configure tab
  3. Select Networking, and click VMkernel adapters
  4. Identify the VMkernel adapter that has vMotion enabled but is using the Default TCP/IP stack
  5. Either remove this VMkernel adapter or disable vMotion service on it

Create New VMkernel Adapter with vMotion TCP/IP Stack

  1. Click Add networking to create a new VMkernel adapter
  2. Select VMkernel Network Adapter as the connection type and click Next
  3. Select the appropriate standard or distributed switch and click Next
  4. On the Port properties page, select vMotion from the TCP/IP stack drop-down menu
  5. Configure the network label, VLAN ID, and IP settings as required
  6. Complete the wizard to create the new VMkernel adapter

Verify Configuration and Test Migration

  1. Run Data-Plane Diagnostics again to verify TCP 902 connectivity is successful
  2. Attempt the HCX migration again

If the error persists after following these steps, contact Broadcom Support for further assistance.

Please provide the below information when opening a support request with Broadcom for this issue

  • Migration details (VM Name, Migration id, Screenshot of error as seen via UI)
  • Source and Target HCX log bundles with IX appliance in question checked please see - Gather Technical Support Logs
  • vCenter with ESXi host logs

Additional Information

Benefits of the vMotion TCP/IP Stack

The vMotion TCP/IP stack is designed specifically for migration traffic and provides several important advantages:

The vMotion TCP/IP stack isolates migration traffic from other network traffic, which helps ensure consistent performance during migrations. This isolation allows the traffic to use a dedicated default gateway and routing table, which is especially important in complex network environments where routing conflicts might otherwise occur.

Additionally, the vMotion TCP/IP stack assigns a separate set of buffers and sockets to migration traffic. This dedicated resource allocation prevents migration operations from competing with other services for network resources, which can improve overall migration reliability and performance.

Network Configuration Recommendations

It is highly recommended to configure vMotion traffic on a separate VLAN in addition to using the vMotion TCP/IP stack. Using a dedicated VLAN for vMotion traffic provides additional isolation at the network level, which enhances both security and performance. This separation ensures that vMotion traffic doesn't compete with other traffic types and helps prevent potential network congestion during migration operations.

Troubleshooting Guidelines

When troubleshooting HCX Bulk Migrations issues, consider these key areas:

The Data-Plane Diagnostics tool is essential for identifying connectivity problems between HCX components. Run this tool to verify TCP port 902 connectivity between source and target environments, as this port is critical for successful migrations.

Always verify proper VMkernel adapter configuration on both source and target hosts. All hosts involved in migrations should have VMkernel adapters configured with the vMotion TCP/IP stack rather than the Default stack.

Ensure that firewall rules throughout your network allow port 902 for proper communication between HCX components and ESXi hosts. Network connectivity issues are a common cause of migration failures.