Packet loss observed while fragmented traffic passes through NSX-V edge

search cancel

Packet loss observed while fragmented traffic passes through NSX-V edge

book

Article ID: 303264

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

Packet loss observed while fragmented traffic passing through NSX-V edge
No packet loss seen with unfragmented traffic
Ping from VM behind the edge to outside with MTU greater than 1472 shows packet loss
Ping from same VM without specifying MTU or with 1472 show no packet loss
MTU on the VM is set to default 1500
Edge and/or DLR interfaces are set to 9000

Environment

NSX for vSphere 6.4.x

Cause

This behavior is expected and is a consequence of the fragmentated traffic being transmitted by their VMs:

• What is happening is the customer is encountering the below threshold: ipfrag_high_thresh - This is the maximum memory used to reassemble IP fragments. When ipfrag_high_thresh bytes of memory are allocated for this purpose, the fragment handler will toss packets until ipfrag_low_thresh is reached.
• The Edge Services Gateway needs to reassemble IP fragments so that features like FW/NAT can function correctly.
• During the reassembly process, fragments are held in memory until all fragments with the same ID, which are also part of the original packet, have been received.
• The memory usage for the IP fragments increases when the total number of pending fragments from all flows increases, the current default of the high/low thresholds are 4M/3M bytes.
• Increasing them by 3X, for example, means edge allows 3X of pending fragments.

To verify if the edge is dropping packets due to exceeding ipfrag_high_thresh, use the following commands:

From edge exec mode, check the difference between ReasmFails and ReasmTimeout, i.e. if (ReasmFails - ReasmTimeout) is incrementing:
NSX-edge-32-0> show packet drops | grep Reasm
ReasmFails : 0
ReasmTimeout : 0

Or

From edge root, check if the difference of (IpReasmFails - IpReasmTimeout) is incrementing. Non zeros for IpReasmReqds and IpReasmOKs is normal:

Delta value since last run of the command:
[root@NSX-edge-32-0 ~]# /sbin/nstat -z | grep IpReasm
IpReasmTimeout 0 0.0
IpReasmReqds 0 0.0
IpReasmOKs 0 0.0
IpReasmFails 0 0.0

Accumulated value:
[root@NSX-edge-32-0 ~]# /sbin/nstat -a -z | grep IpReasm
IpReasmTimeout 0 0.0
IpReasmReqds 0 0.0
IpReasmOKs 0 0.0
IpReasmFails 0 0.0

Resolution

The correct and recommended course of action to fix this is to identify the source of IP-fragmented traffic and address it, including but not limited to:

MTU setting in guest OS set to default 1500 or any other value less than MTU size in the data path.
VPN or Radius implementation.

Please contact Broadcom support for assistance with workaround related to increasing Edge IP-fragmentation thresholds via API call.

Feedback

thumb_up Yes

thumb_down No