Public site inaccessibility/intermittent after NSX Edge failover due to MTU mismatch
search cancel

Public site inaccessibility/intermittent after NSX Edge failover due to MTU mismatch

book

Article ID: 412975

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

  • Public-facing websites and applications (e.g., YouTube, Google, etc) become inaccessible, with pages continuously loading, under specific edge failover scenarios.
  • In a clustered site, during failover tests, where cluster-A traffic was intentionally routed via a cluster-B Edge node, public sites were found to be inaccessible or intermittent behavior observed.

Environment

VMware NSX-T Data Center
VMware NSX

Cause

The root cause of the inaccessibility was an MTU mismatch on the Layer 3 VLAN interface of TEP IP subnets used for Edge and ESXi Transport nodes.
This interface was configured with the default MTU of 1500 bytes. 
Geneve overlay packets, used for traffic from ESXi to Edge, require an MTU of at least 1600 bytes according to documentation.

This behavior is expected: when there's an MTU mismatch in the network path, larger packets will be fragmented or dropped, leading to connectivity issues.

To validate the vmkping behavior between ESXi host and Edge TEP
vmkping -S vxlan -s 1572 -I vmk10 -d <Edge TEP IP>

-S = Stack
-s = size
-I = Interface
-d = DontFragment

Note: Test it with both lower MTU and Jumbo MTU

Resolution

We suggest to adjust the MTU on the Layer 3 VLAN interface to meet the basic MTU requirement (at least 1600)

Additional Information

For detailed guidance on MTU settings in NSX, please refer to the official VMware documentation:
VMware NSX MTU Guidance: https://techdocs.broadcom.com/us/en/vmware-cis/nsx/vmware-nsx/4-2/installation-guide/transport-zones-and-transport-nodes/mtu-guidance.html