1500 MTU IP fragment could be dropped when VMware VeloCloud SD-WAN Partner Gateway's handoff uses QinQ (0x8100)
search cancel

1500 MTU IP fragment could be dropped when VMware VeloCloud SD-WAN Partner Gateway's handoff uses QinQ (0x8100)

book

Article ID: 388273

calendar_today

Updated On:

Products

VMware VeloCloud SD-WAN

Issue/Introduction

Consider below topology:

Customer's radius authentication goes through VCE1 to Partner Gateway's handoff interface and reach another handoff interface of peer Partner Gateway and then go to VCE2 and finally reach radius server located at LAN side of VCE2. Consider path MTU is 1500 and LAN interface MTU is also 1500. Both Partner Gateway's tag type is QinQ (0x8100). Under this scenario, customer may find 802.1x Radius authentication fails.

Environment

VMware VeloCloud SD-WAN supported versions. This issue should be only applicable for QinQ tag type (dual Vlan).

Cause

As Radius is based on UDP, some large UDP packets (larger than 1360 bytes) will be asked by VCE to be fragmented into IP fragments. Upon capturing on the VCE2's LAN side, customer may find that those IP fragments are lost which causes Radius authentication failure. After capturing at VCG1 and VCG2's handoff interface, customer may VCG1 does send out those IP fragments (MTU 1500) to its PE. However on peer VCG2's handoff interface, only small IP fragments are received, those large IP fragments (1500 bytes) are missing. Usually the large IP fragments are the first fragment, so those small IP fragments are the folllowing fragments. As the first IP fragments are missing, those subsequent IP fragments are invalid and dropped by VCG. So at VCE2's LAN side, customer finds no IP fragment at all.

VCG1:

To narrow down the issue, customer needs to ping 1500 bytes from PE to its Partner Gateway's handoff local IP address to check if 1500 bytes can work between PE and Partner Gateway:

If 1500 bytes fails, then root cause is found. Decrease the ping size by 1 byte each time and find the final working size, say 1496 in this example:

As customer uses QinQ dual vlan, there are 4 more bytes added into the frame that can cause the issue. If customer uses 802.1Q single Vlan, it can release 4 bytes and make 1500 bytes IP fragment pass. 

Resolution

Customer can try 802.1Q single vlan to release 4 more bytes. 

Additional Information

If largest working size is lower than 1496, use 802.1Q cannot fix the issue. More troubleshooting should be done between PE and Partner Gateway. If gateway is deployed in Esxi, involve vsphere support team to check together.