HCX is not migrating or passing traffic over network extensions at expected speeds
search cancel

HCX is not migrating or passing traffic over network extensions at expected speeds

book

Article ID: 379617

calendar_today

Updated On:

Products

VMware Cloud on AWS VMware HCX VMware vSphere ESXi VMware NSX-T Data Center

Issue/Introduction

HCX Performance Troubleshooting Guide

Common Scenarios for Performance Issues

Performance issues typically appear under a few scenarios:

  • HCX is not migrating or passing traffic over network extensions at expected speeds
  • When large amounts of traffic are being passed through the network extension, transport analytics show speed spikes to expected levels, but consistently dip below acceptable thresholds

Testing Procedure in HCX

To get a better picture of what is occurring, follow these steps to perform testing in HCX:

  • SSH into the HCX manager.
  • Login to ccli using the command:

   ccli

  • Within ccli, list the deployed appliances with the command

   list

  • From the list shown, find the ID of the appliance to test on.
  • To access the chosen appliance, type:

   go <id>

  • Once within the appliance that has speed concerns, run the performance test:

   perftest all


This test may take 5-20+ minutes depending on the size of the environment.

If speed issues are present, the results will likely be similar to

   ================= SUMMARY OF RESULTS ===================
   ** Total Test Duration = 20.1 minutes **
       (Each Test Duration  = 30 sec)
       (Each IPSEC Test Duration  = 15 sec)

   Throughput Report
   |-------------------------------------------------------|
   | Test Name    | IF # | Fwd            | Rev            |
   |-------------------------------------------------------|
   | IPSEC Tunnel | 0    | 1.25 Gbits/sec | 1.57 Gbits/sec |
   |-------------------------------------------------------|
   | SITE         | 0    | 1.20 Gbits/sec | 307 Mbits/sec  |
   |-------------------------------------------------------|

Notice how the site rev test is at 307 Mbits on the Rev when getting 1.20 Gbits on the Fwd. This test was run locally, so it's known that the test is able to send 1.2 Gbits, but when the traffic is received from the other node, the speed is about 1/4 - 1/3rd of what was sent. While some variance is expected, this is abnormal and is a good indication of traffic loss, often caused by fragmentation.

The `perftest all` command also returns a PMTU test result. Alternatively, it's possible to run:

   pmtu

to see the PMTU Results

Understanding PMTU Results

Path MTU (PMTU) is equal to the minimum of the MTUs of each hop in the underlay. PMTU testing is a simple way to start troubleshooting this issue. 

When testing in an environment with a mismatched MTU configuration, the results might look like this:

++++++++++ StartTest ++++++++++
---------- Uplink Path MTU [cloud-ip >>> local-infra-ip] ---------- 1500
---------- Uplink Path MTU [cloud-ip <<< local-infra-ip] ---------- 8000
---------- Uplink Path MTU [cloud-ip <<< local-infra-ip] ---------- 1500
---------- Uplink Path MTU [cloud-ip <<< local-infra-ip] ---------- 8000

This output is important because it shows that when traffic leaves the cloud side, it is at 1500 MTU, and on the return it is 8000 MTU. This is a clear indicator of an MTU problem.

Note: While PMTU approach can help in discovering MTU, there are few cases where a complete reliance on PMTU can still cause problems. 

  1. PMTU can break with router/firewall misconfigurations. These result in silent failures that can become difficult to debug. Ideally the PMTUD behavior should be overridable by setting the MTU manually.  On top of that hardened routers or firewall policies may prevent these devices from sending ICMP packets back to the source or drop received ICMP packets from downstream routers causing the PMTU to fail. 
  2. PMTU does not solve MTU mismatches within the L2 domain. As an example, if the HCX appliance Uplink MTU and the VDS MTU are different then PMTU will not catch this difference and communication can still blackhole. 
  3. PMTU cannot catch End to End MTU mismatches. E.g if one Appliance has Uplink MTU 1500 and the peer appliance has Uplink MTU 9000. Let's assume the entire path between these two appliances support 9000. Then a MSS sized packet coming from the peer side (size around 8960) would still be dropped because PMTU could not figure out that one side has an appliance with MTU 1500. 

Cause

MTU Mismatch in HCX Network Configuration

The root cause of HCX performance issues often stems from an MTU (Maximum Transmission Unit) mismatch somewhere in the network configuration chain. This mismatch can significantly impact data transfer speeds and overall network performance.

Key Points of MTU Matching

VMware Management and vMotion vmks

The MTU must match between the VMware management and vMotion vmks and the HCX appliances on both sides of the network.

HCX Interconnect

MTU consistency is crucial between the HCX appliances and their interconnect links.

Uplink Connections

The MTU settings must align on the uplink connections from the HCX appliances to the WAN/Internet on both sides of the network.

WAN/Internet Segment

Ensure consistent MTU across the entire path through the WAN or Internet, including any firewalls or network devices.

Impact of MTU Mismatch

  • An MTU mismatch at any point in this chain can lead to:
  • Packet fragmentation
  • Increased latency
  • Reduced throughput
  • Inconsistent performance during data migrations or network extension operations

By ensuring MTU consistency across all these points - from VMware management and vMotion vmks, through HCX appliances, across uplinks, and through the WAN - optimal network performance can be maintained for HCX operations in hybrid and multi-cloud VMware environments.

Regular MTU audits and consistent configuration across all network segments are essential for preventing these performance issues and maintaining efficient HCX operations.

Please see the following diagram in Additional Information for a Visual Representation

HCX Network Configuration: MTU Matching Points Diagram

Resolution

Resolution: Identifying and Resolving MTU Mismatches

To resolve MTU-related performance issues, a thorough analysis of the infrastructure is necessary to identify points where MTU is not configured as expected. Follow these steps to investigate and correct MTU mismatches:

Confirm MTU configuration of compute profile on each side  

  • Access the HCX Connect Profile settings
  • Review the MTU setting for each network profile (e.g., MGMT-Profile-vDS-COMP)

Confirm MTU configuration of network profile on each side

  • Check HCX network profiles for both source and destination environments
  • Ensure MTU settings are consistent and appropriate for your network

Confirm MTU configuration of the vmk and vSwitch on each side

  • In vSphere, navigate to Networking > VMkernel adapters
  • Check the MTU setting for management (vmk0) and vMotion interfaces
  • Verify vSwitch MTU settings match the VMkernel adapter settings

Confirm MTU configuration of NSX if it is used on the cloud side

  • Access NSX management interface
  • Review and adjust MTU settings for NSX components

Perform these checks for both management and vMotion traffic. The most ideal configuration typically uses jumbo frames with an MTU of 8500-9000 for both traffic types.

After confirming HCX is configured for the correct MTU on both cloud and local sides, verify that the VMkernel adapters and vSwitches are set as expected within vCenter

  1. Navigate to the host configuration in vCenter
  2. Check VMkernel adapter settings (vmk0, vmk1, etc.)
  3. Verify vSwitch MTU settings
  4. Ensure consistency across all network components

To validate MTU settings on the local side

  1. Access a leaf or spine switch
  2. Test along the path from the top of rack router to the Network Extension (NE)
  3. Use ping commands with varying MTU sizes to test expected frame sizes
  4. Reduce MTU size if issues occur until finding the maximum MTU that flows along the path
  5. Confirm switches are configured for jumbo frames as expected
  6. Verify end-to-end functionality on the local side

After addressing local side issues, run a perftest again. If problems persist, investigate the cloud network.

For VMware Cloud on AWS environments

  1. Navigate to the Networking section
  2. Access the global configuration page
  3. Set the intranet MTU to match the local environment
  4. Refer to VMC on AWS documentation for detailed steps on configuring MTU

Continue this process until identifying and correcting all MTU mismatches along the network path. After resolving these issues, perftest and MTU results should return expected values.

If performance issues persist after implementing these changes, consider opening a support case with Broadcom for further assistance.

Please see the following diagrams in Additional Information for a Visual Representation of the VMK, vSphere and HCX configurations

HCX Connect Profile and Network Configuration with MTU Settings

and

 

VMkernel Adapter MTU Configuration in vSphere and HCX Network Diagram


and

 

vSphere Distributed Switch Properties

Additional Information

HCX Connect Profile and Network Configuration with MTU Settings

VMkernel Adapter MTU Configuration in vSphere and HCX Network Diagram

vSphere Distributed Switch Properties