Performance issues typically appear under a few scenarios:
To get a better picture of what is occurring, follow these steps to perform testing in HCX:
ccli
list
go <id>
perftest all
This test may take 5-20+ minutes depending on the size of the environment.
If speed issues are present, the results will likely be similar to
================= SUMMARY OF RESULTS ===================
** Total Test Duration = 20.1 minutes **
(Each Test Duration = 30 sec)
(Each IPSEC Test Duration = 15 sec)
Throughput Report
|-------------------------------------------------------|
| Test Name | IF # | Fwd | Rev |
|-------------------------------------------------------|
| IPSEC Tunnel | 0 | 1.25 Gbits/sec | 1.57 Gbits/sec |
|-------------------------------------------------------|
| SITE | 0 | 1.20 Gbits/sec | 307 Mbits/sec |
|-------------------------------------------------------|
Notice how the site rev test is at 307 Mbits on the Rev when getting 1.20 Gbits on the Fwd. This test was run locally, so it's known that the test is able to send 1.2 Gbits, but when the traffic is received from the other node, the speed is about 1/4 - 1/3rd of what was sent. While some variance is expected, this is abnormal and is a good indication of traffic loss, often caused by fragmentation.
The `perftest all` command also returns a PMTU test result. Alternatively, it's possible to run:
pmtu
to see the PMTU Results
Path MTU (PMTU) is equal to the minimum of the MTUs of each hop in the underlay. PMTU testing is a simple way to start troubleshooting this issue.
When testing in an environment with a mismatched MTU configuration, the results might look like this:
++++++++++ StartTest ++++++++++
---------- Uplink Path MTU [cloud-ip >>> local-infra-ip] ---------- 1500
---------- Uplink Path MTU [cloud-ip <<< local-infra-ip] ---------- 8000
---------- Uplink Path MTU [cloud-ip <<< local-infra-ip] ---------- 1500
---------- Uplink Path MTU [cloud-ip <<< local-infra-ip] ---------- 8000
This output is important because it shows that when traffic leaves the cloud side, it is at 1500 MTU, and on the return it is 8000 MTU. This is a clear indicator of an MTU problem.
Note: While PMTU approach can help in discovering MTU, there are few cases where a complete reliance on PMTU can still cause problems.
The root cause of HCX performance issues often stems from an MTU (Maximum Transmission Unit) mismatch somewhere in the network configuration chain. This mismatch can significantly impact data transfer speeds and overall network performance.
The MTU must match between the VMware management and vMotion vmks and the HCX appliances on both sides of the network.
MTU consistency is crucial between the HCX appliances and their interconnect links.
The MTU settings must align on the uplink connections from the HCX appliances to the WAN/Internet on both sides of the network.
Ensure consistent MTU across the entire path through the WAN or Internet, including any firewalls or network devices.
By ensuring MTU consistency across all these points - from VMware management and vMotion vmks, through HCX appliances, across uplinks, and through the WAN - optimal network performance can be maintained for HCX operations in hybrid and multi-cloud VMware environments.
Regular MTU audits and consistent configuration across all network segments are essential for preventing these performance issues and maintaining efficient HCX operations.
Please see the following diagram in Additional Information for a Visual Representation
To resolve MTU-related performance issues, a thorough analysis of the infrastructure is necessary to identify points where MTU is not configured as expected. Follow these steps to investigate and correct MTU mismatches:
Perform these checks for both management and vMotion traffic. The most ideal configuration typically uses jumbo frames with an MTU of 8500-9000 for both traffic types.
After addressing local side issues, run a perftest again. If problems persist, investigate the cloud network.
Continue this process until identifying and correcting all MTU mismatches along the network path. After resolving these issues, perftest and MTU results should return expected values.
If performance issues persist after implementing these changes, consider opening a support case with Broadcom for further assistance.
Please see the following diagrams in Additional Information for a Visual Representation of the VMK, vSphere and HCX configurations
and
and
Verifying Underlay Network Performance for Service Mesh Uplinks
Network Underlay Characterization and HCX Performance Outcomes