When troubleshooting MTU issues and fragmentation in NSX fabric, a specific set of data and configurations must be checked. This article details what verification is required on the NSX user-interface, prior to opening a support request with Broadcom.
VMware NSX
Many applications are not built to handle fragmentation, and face problems if there are fragmentations in the network. To ensure there is no fragmentation in the network, jumbo frames need to be allowed. In many network device vendors, the default MTU is 1500. But to allow jumbo frames, the MTU value needs to be higher.
To ensure the NSX fabric is allowing jumbo frames and MTU is configured with the expected value, check the below locations on NSX UI
Networking > Settings > Global Networking Config > Global Gateway Configuration > Gateway Interface MTU
Check whether Gateway Interface MTU has a high value. However, ideally this should be at least 100 less than the rest of the NSX fabric.
System > Fabric > Settings > Global Fabric Settings > Tunnel Endpoint
Check whether Tunnel Endpoint MTU has a high value. If Federation is being used, then Remote Tunnel Endpoint MTU also needs to be changed from default 1500 value to a higher value.
System > Fabric > Profiles > Uplink profiles
Check the MTU tab. Whichever uplink profile is being used by any transport node, should have the same MTU (9000 in our example)
Click on an Uplink profile to edit MTU:
(Same needs to be checked at Edge profile as well)
System > Fabric >Settings > Global Fabric Settings > MTU Configuration check > Check now
Keep the MTU value consistent and same in all mentioned locations, and keep the Gateway Interface MTU at least 100 lower than that. For example, to allow jumbo frames, you can set Tunnel Endpoint MTU, Uplink profile MTU, Edge uplink profile MTU to 9000, and set Gateway Interface MTU to 8900. [This is just an example. Select the MTU value which is suitable for your environment]
Maintenance windows required for remediation? Yes. Changing MTU can have impact on live traffic.
If you are contacting Broadcom support about this issue, please provide the following:
Logs from the transport nodes (Edge nodes and ESXi nodes)