Unable to connect between server and client VMs over NSX VPN tunnel (IPSec/L2VPN)
search cancel

Unable to connect between server and client VMs over NSX VPN tunnel (IPSec/L2VPN)

book

Article ID: 419969

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

  • Applications like SQL face communication issue between the client and server over an NSX VPN, even if the initial TCP handshake between the two succeeds normally.
  • Other VMs connected via NSX IPSec or L2VPN tunnel also continue to face connectivity issue even though TCP handshake, port connectivity is successful.
  • Even though ping tests or port connectivity is successful between the client and server VMs, the specific application on the server still reports the client as unreachable/disconnected.
  • Packet captures from both the server and client end may show that initial connectivity is established, however shortly after a high number of re-transmits are observed indicating a loss of communication in the datapath.
  • ICMP connectivity for smaller sized packets work as expected, however for packets sized 1500 or greater than 1500 are dropped (with fragmentation not allowed).
  • The traffic between the server and the client across a WAN, or other MTU-restricted datapath.
  • Interface statistics for the associated NSX Edge, Tier-0 (T0), and Tier-1 (T1) show no drops associated with the VPN path (IPsec/L2VPN) and corresponding interfaces. 

Environment

VMware NSX

Cause

The packet size is too large to pass through the WAN or other datapath, even though the traffic passes through the NSX VPN normally.

To determine this, packet captures can be done at various points on the Edge where the VPN traffic passes in order to confirm the VM traffic is passing normally, as well as the associated MTU of the packets. If the traffic is seen to leave the NSX Tier-0 with an MTU too large (the exact value will depend on the datapath) then the packets will get dropped outside of NSX. Some example capture points could be:

  • Ingress to the VPN
    • This could be either the Tier-0 or Tier-1 gateway, depending on the VPN configuration
  • If the VPN is on a Tier-1 (T1), the uplink between the T1 and T0 gateways
  • The egress from the T0 gateway to the WAN or next hop in the datapath

Because of NSX VPN communication encapsulation, additional filters will likely need to be applied to the capture commands to see the desired traffic, though caution must also be taken to avoid over-filtering.

For example, the below image shows packets from the server leaving the T0 gateway uplink interface to the WAN as expected, however the size is too large (1582) to traverse the WAN and so are dropped before reaching the client, leading to a timeout.

NOTE: Based on the volume of traffic that may be observed with the capture, caution must be taken to prevent any impact to the Edge node itself. Therefore if additional assistance is required, please create a Broadcom Support case for assistance: Creating and managing Broadcom support request (SR) cases

The command for the above capture was as follows to filter for ESP (Encapsulating Security Payload) traffic, and source/destination IP addresses:

start capture interface ########-####-####-####-############ direction dual expression esp and host ###.###.###.### and host ###.###.###.###

For more information about NSX packet captures on an NSX Edge node, see the following: Troubleshooting NSX using Packet Captures

Resolution

Configure MSS Clamping on the NSX VPN tunnel in the IPsec/L2VPN Session's Advanced Settings to force smaller packet size to then allow communication.

IPSec Session Advanced properties:

L2VPN Server Session Advanced properties:

NOTE: The exact MSS Value and Direction configurations will depend on the environment. Some tuning may be required to identify the largest value possible to still allow traffic to flow. 

After making the required change, confirm in the packet captures the packet size has been lowered:

Additional Information

See Understanding TCP MSS Clamping for more information about this setting.