Packet loss observed while fragmented traffic passes through NSX-V edge
search cancel

Packet loss observed while fragmented traffic passes through NSX-V edge

book

Article ID: 303264

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

  • Packet loss observed while fragmented traffic passing through NSX-V edge
  • No packet loss seen with unfragmented traffic
  • Ping from VM behind the edge to outside with MTU greater than 1472 shows packet loss
  • Ping from same VM without specifying MTU or with 1472 show no packet loss
  • MTU on the VM is set to default 1500
  • Edge and/or DLR interfaces are set to 9000

Environment

NSX for vSphere 6.4.x

Cause

This behavior is expected and is a consequence of the fragmentated traffic being transmitted by their VMs:

• What is happening is the customer is encountering the below threshold: ipfrag_high_thresh - This is the maximum memory used to reassemble IP fragments. When ipfrag_high_thresh bytes of memory are allocated for this purpose, the fragment handler will toss packets until ipfrag_low_thresh is reached.
• The Edge Services Gateway needs to reassemble IP fragments so that features like FW/NAT can function correctly.
• During the reassembly process, fragments are held in memory until all fragments with the same ID, which are also part of the original packet, have been received.
• The memory usage for the IP fragments increases when the total number of pending fragments from all flows increases, the current default of the high/low thresholds are 4M/3M bytes.
• Increasing them by 3X, for example, means edge allows 3X of pending fragments.

To verify if the edge is dropping packets due to exceeding ipfrag_high_thresh, use the following commands:

From edge exec mode, check the difference between ReasmFails and ReasmTimeout, i.e. if (ReasmFails - ReasmTimeout) is incrementing:
NSX-edge-32-0> show packet drops | grep Reasm
 ReasmFails           : 0
 ReasmTimeout         : 0


Or

From edge root, check if the difference of (
IpReasmFails - IpReasmTimeout) is incrementing. Non zeros for IpReasmReqds and IpReasmOKs is normal:

Delta value since last run of the command:

[root@NSX-edge-32-0 ~]# /sbin/nstat -z | grep IpReasm
IpReasmTimeout                  0                  0.0
IpReasmReqds                    0                  0.0
IpReasmOKs                      0                  0.0
IpReasmFails                    0                  0.0


Accumulated value:
[root@NSX-edge-32-0 ~]# /sbin/nstat -a -z | grep IpReasm
IpReasmTimeout                  0                  0.0
IpReasmReqds                    0                  0.0
IpReasmOKs                      0                  0.0
IpReasmFails                    0                  0.0

Resolution

The correct and recommended course of action to fix this is to identify the source of IP-fragmented traffic and address it, including but not limited to:

  • MTU setting in guest OS set to default 1500 or any other value less than MTU size in the data path.
  • VPN or Radius implementation. 

Please contact Broadcom support for assistance with workaround related to increasing Edge IP-fragmentation thresholds via API call.