Understanding MTU and MSS Calculations Across Complex, Layered Infrastructures

search cancel

Understanding MTU and MSS Calculations Across Complex, Layered Infrastructures

book

Article ID: 383872

calendar_today

Updated On:

Products

VMware vSphere ESXi VMware HCX VMware NSX

Issue/Introduction

When configuring networks for optimal performance—especially in environments with multiple overlays, tunnels, and encryption such as those involving HCX, NSX, ESXi hosts, top-of-rack (ToR) routers, leaf-spine switches, and complex underlay networks—the initial MTU setting at the source (initiator) is only the beginning. Each subsequent hop or layer adds overhead in the form of tunnel headers, encryption wrappers (for example IPsec in HCX), VLAN tags, or other encapsulations. To avoid fragmentation, every hop must be set with an MTU that accommodates not only the original initiator’s payload but also the cumulative overhead from all intermediate layers.

Cause

Under-Accounted Overheads: A typical starting point is a 1500-byte MTU at the virtual machine (VM) or host initiator. However, when traffic is encapsulated by NSX overlays, encrypted by IPsec (as often used in HCX), or passed through VLAN tags and other tunnels, the effective packet size grows beyond 1500 bytes.
Multiple Initiators and Layers: In complex infrastructures, traffic might first be framed by a VM’s network stack (initiator #1), then encapsulated by an HCX appliance (initiator #2), and further processed by NSX gateways (initiator #3). Each new overlay or encryption point effectively acts as another initiator adding overhead.
Inconsistent MTU Configuration: Without scaling MTU sizes upward at each subsequent hop, fragmentation and reduced performance are likely.

Resolution

Base Calculation at the Initiator (Many times a VM)

Start with the simplest form. The initiator’s payload can be viewed as

Payload = Initiator_MTU - (IP_Header + TCP_Header)

For a typical IPv4/TCP scenario:

IP header: ~20 bytes
TCP header: ~20 bytes

If the initiator sets an MTU of 1500 bytes:

Payload = 1500 - (20 + 20) = 1460 bytes of data (MSS)

This 1460-byte MSS fits exactly into a 1500-byte MTU once IP and TCP headers are included, before adding any tunnel or encryption overhead.

Adding Tunnel and Encryption Overheads at the Next Hop

Consider traffic flowing into an HCX appliance creating an IPsec-encrypted tunnel. IPsec might add approximately 73+ bytes, and a Geneve or VXLAN header might add ~50 bytes:

Encapsulation Overhead (VXLAN/Geneve): ~50 bytes
IPsec Overhead: ~73 bytes (varies with mode and encryption/auth settings)

Total new overhead: ~123 bytes

If the initial MSS and headers perfectly filled 1500 bytes, adding another 123 bytes pushes the total to ~1623 bytes. To avoid fragmentation, an MTU value around 1700 bytes provides approximately 200 bytes of additional room to accommodate these overheads.

Chaining Multiple Initiators/Overlays (HCX into NSX)

Consider a scenario where after HCX encapsulation, the packet enters an NSX overlay adding another ~50 bytes

Base from initiator: ~1500 bytes
HCX overhead: ~123 bytes
NSX overhead: ~50 bytes
Total: ~1673 bytes

Around 1700 bytes at subsequent layers accommodates these increments. If further overhead requirements (for example, additional VLAN tags, MPLS encapsulation, or other specialized overlays) appear, adding them into the calculation is necessary. Disabling double encryption if not required reduces overhead. If the environment runs into scenarios where overhead increments change frequently, reviewing each added layer helps determine whether existing MTU values are sufficient.

Scaling Up Through Core Infrastructure (VMKs, vSwitches, Physical Switches, Leaf-Spine Network)

If multiple layers and tunnels are involved, summing all increments determines the required MTU. For example, starting from 1500 and adding 123 bytes for HCX plus 50 bytes for NSX gives ~1673 bytes, fitting within a 1700-byte MTU. In many environments, common configurations exist, such as keeping VMs and HCX at 1500 bytes, vmkernel (vmk) and vSwitch at 1700 bytes, and NSX at 1600 bytes. Another scenario involves setting HCX at ~8800 bytes, NSX at ~8900 bytes, and vmkernel plus physical switches at 9000 bytes, leveraging jumbo frames for substantial headroom.

A frequently observed practice is running the core infrastructure (like leaf-spine or ToR switches) at 9000 bytes, regardless of how HCX and host infrastructure MTUs are configured. This provides significant overhead capacity. Some deployments keep 1500 bytes for HCX and VMs while using 1700 for vmk and vSwitch and 1600 for NSX. Others simply default to 9000 bytes in the core and adjust overlays accordingly.

Multiple Initiators Consideration

If two or more overlay technologies run in sequence (e.g., HCX followed by NSX), each adds overhead. Identifying the largest possible encapsulated packet size from all initiators helps determine if the chosen MTU accommodates all overheads. If overhead still causes fragmentation, disabling double encryption (when not required) reduces overhead, minimizing the need to increase MTU further.

Iterative Approach and Verification

Incremental testing is often beneficial. Start by:

Calculating base VM payload + IP/TCP headers = initial MSS.
Adding HCX overhead.
Adding NSX overhead.
Including any other overhead (IPsec, VLAN tags, other tunnels).

Summing these overheads provides clarity on the required MTU at each layer. Using ping with “Don’t Fragment” to test increasing packet sizes verifies the actual path MTU. PMTU discovery commands in HCX can identify the maximum supported size. Adjusting configurations based on these findings ensures that each network element can handle the combined overhead.

In Summary

Begin with the base MSS from a 1500-byte MTU minus IP/TCP headers.
Add overhead increments from HCX, NSX, and additional tunnels or VLAN tags.
Common scenarios include 1500 bytes for HCX and VMs, with vmk and vSwitch at 1700 bytes, and NSX at 1600 bytes, or using jumbo frames (e.g., HCX ~8800, NSX ~8900, and vmk plus switch at 9000 bytes).
Many infrastructures run their core at 9000 bytes for abundant headroom.
If fragmentation persists, disabling double encryption if occurring and not required reduces overhead.
Validate incrementally using ping and PMTU tools in HCX.

By carefully calculating each overhead layer and configuring each subsequent hop’s MTU, stable, efficient, and high-performance network communication can be maintained across complex, multi-layered network infrastructures.

Additional Information

Traffic Types Benefiting from Larger MTU Sizes

In complex infrastructures where multiple layers of encapsulation and encryption are present, certain types of network traffic can particularly benefit from larger MTU sizes. Understanding these traffic patterns helps network architects make informed decisions about MTU configuration across their infrastructure.

Bulk Data Transfer Applications

Applications that regularly transfer large amounts of data in sustained streams see significant performance improvements with larger MTUs. This includes backup systems, disaster recovery solutions, and storage replication services. When these applications operate across HCX-enabled environments, the ability to handle larger packet sizes reduces the overhead associated with packet fragmentation and reassembly, leading to more efficient use of available bandwidth. For example, a backup operation transferring several terabytes of data can complete notably faster when using jumbo frames (9000-byte MTU) compared to standard 1500-byte MTUs, as the reduced packet overhead and processing requirements allow for more efficient data transfer.

Virtual Machine Migration Traffic

Live migration of virtual machines, especially in environments using both HCX and NSX, generates substantial sustained data transfer. The migration process involves transferring the VM's memory state and disk contents, which can benefit significantly from larger MTUs. With proper MTU sizing that accounts for both HCX encryption overhead (~73 bytes) and NSX encapsulation (~50 bytes), migration operations can achieve higher throughput and lower completion times. In practice, organizations often see 20-30% faster migration times when properly configured larger MTUs are implemented across the infrastructure.

Big Data and Analytics Workloads

Modern analytics platforms often process large datasets across distributed systems. These workloads typically involve transferring substantial blocks of data between compute nodes. In environments where this traffic traverses multiple overlay networks, configuring larger MTUs (such as the 9000-byte configuration mentioned for core infrastructure) can reduce the protocol overhead ratio, improving overall application performance. For instance, Hadoop clusters processing petabyte-scale datasets can see significant improvements in job completion times when network paths are configured with appropriate larger MTUs.

Storage Area Network (SAN) Traffic

iSCSI and NFS traffic, particularly in virtualized environments, benefit from larger MTUs due to their block-oriented nature. When this traffic must traverse HCX-encrypted tunnels or NSX overlay networks, the additional overhead from encryption and encapsulation makes proper MTU sizing even more critical. The combination of jumbo frames (typically 9000 bytes) at the core with appropriately sized MTUs at overlay layers (such as 8800 bytes for HCX) ensures efficient storage operations. Organizations frequently report 15-25% improvements in storage throughput when moving from standard to jumbo frames in their SAN infrastructure.

High-Performance Computing (HPC) Applications

HPC workloads often generate large, sustained data flows between compute nodes. These applications are particularly sensitive to network latency and overhead. When HPC traffic must traverse complex overlay networks, larger MTUs help maintain performance by reducing the impact of encapsulation overhead and minimizing CPU utilization associated with packet processing. Scientific computing applications, for example, can see substantial performance improvements when larger MTUs are properly implemented across the compute cluster network.

Database Replication Traffic

Database replication, especially in distributed environments, involves continuous streams of data that must maintain consistency across sites. When this traffic passes through multiple overlay networks (such as HCX-encrypted tunnels followed by NSX overlays), larger MTUs help maintain efficiency by reducing the fragmentation of large database transactions. This is particularly important for maintaining low replication lag in mission-critical database environments.

Traffic Types Not Benefiting from Larger MTUs

Understanding which traffic types don't benefit from larger MTUs is equally important for network design. Here are key examples:

Voice over IP (VoIP) and SIP Traffic

VoIP and SIP traffic typically generate small packets (100-200 bytes) optimized for real-time delivery. Larger MTUs provide no benefit for these applications because:

Voice packets are intentionally kept small to minimize latency
The payload size is fixed by codec requirements
Larger MTUs could actually increase jitter by causing queuing delays
SIP signaling packets are typically under 500 bytes

Interactive Web Applications

Modern web applications generating frequent, small interactions don't benefit from larger MTUs because:

Most API calls and user interactions generate small payload sizes
Real-time features require minimal latency
WebSocket frames are often intentionally small for responsive updates
REST API calls typically involve JSON payloads under 1KB

Gaming Traffic

Online gaming traffic is optimized for minimal latency and typically involves:

Small packet sizes for player position updates (often under 100 bytes)
Frequent, bidirectional communications
Time-sensitive data that cannot benefit from packet coalescing
UDP-based protocols optimized for small packet transmission

IoT Device Communications

Internet of Things devices typically generate small data packets because:

Sensor data is often just a few bytes
Devices may have limited processing power
Battery-operated devices benefit from minimal transmission sizes
Many IoT protocols are optimized for small packet sizes

Instant Messaging and Chat Applications

These applications generate primarily small packets because:

Text messages are typically under 1KB
Presence updates are minimal in size
Status changes require minimal payload
Real-time delivery is prioritized over bandwidth efficiency

DNS Queries

Domain Name System traffic consists of small queries and responses:

Typical DNS queries are under 512 bytes
Larger MTUs provide no benefit for resolution speed
UDP-based queries are optimized for small packet sizes
Response times are more dependent on resolver proximity than packet size

Impact Considerations

When implementing larger MTUs in complex network environments, several critical factors must be considered, both from an infrastructure perspective and in terms of performance impact.

Understanding these considerations helps network architects make informed decisions about MTU sizing across their infrastructure.

The foundation of successful MTU implementation starts with network path consistency. All devices along the network path must support the configured MTU size, which becomes particularly challenging in environments with multiple initiators and overlays. This requires careful calculation of overhead at each layer, accounting for various encapsulation and encryption requirements as detailed in earlier sections.

Application behavior presents another crucial consideration. Not all applications automatically optimize their packet sizes to take advantage of larger MTUs. Network administrators often need to implement configuration changes on their network devices to control the maximum amount of data that can be sent in a single TCP segment. This careful adjustment prevents applications from attempting to transmit packets that exceed the network's capabilities, which would result in unnecessary fragmentation or packet loss.

The relationship between MTU size and latency introduces additional complexity that must be carefully managed. This manifests in several ways across the network:

Fragmentation overhead becomes a significant concern when packets exceed the MTU size. The process of fragmenting and reassembling packets introduces additional latency at each network hop, potentially adding several milliseconds of delay per fragmentation event. For perspective, a 1GB file transfer might require thousands fewer fragmentation operations when using 9000-byte MTUs compared to 1500-byte MTUs, resulting in noticeable performance improvements.

Processing delays affect overall network performance, as each packet requires processing at network devices regardless of its size. Larger MTUs create an advantage by reducing the total number of packets needed for the same amount of data. This efficiency can result in network devices spending 20-30% less CPU time processing packets when handling the same data volume with larger MTUs.

In environments with mixed traffic types, queuing delays require special attention. Larger packets can increase queuing delays for smaller, time-sensitive packets, particularly in congested networks where real-time applications share infrastructure with bulk data transfers. This often necessitates implementing Quality of Service (QoS) policies to prevent larger packets from causing excessive delays for time-sensitive traffic.

Retransmission impact becomes particularly relevant in networks experiencing packet loss. When larger MTUs are in use, each lost packet contains more data that must be retransmitted, potentially leading to more significant latency spikes. This consideration becomes especially important in networks with higher packet loss rates, where the impact of retransmissions can compound quickly.

The challenge of balancing these various factors increases in mixed traffic environments, where network infrastructure must accommodate both demanding high-throughput applications and smaller, interactive traffic patterns. Success in these environments often depends on implementing appropriate QoS policies and carefully considering the needs of all application types when determining MTU sizes.

By understanding both the beneficial and non-beneficial traffic patterns for larger MTUs, network architects can better justify and plan for the implementation of increased MTU sizes across their infrastructure, particularly in complex environments involving multiple overlay technologies like HCX and NSX. This knowledge allows for more nuanced network designs that can optimize performance for different types of applications while maintaining efficient operation for all traffic types.

Feedback

thumb_up Yes

thumb_down No