Troubleshooting NSX L2 VPN
search cancel

Troubleshooting NSX L2 VPN

book

Article ID: 393587

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

L2 VPN(Layer 2 Virtual Private Network) connectivity allows extending layer 2 networks across datacenters securely.

Route based IPsec is used as transport tunnel for L2 traffic.

NSX-T L2VPN managed server and managed client can stretch VLAN as well as overlay(VNI) segments whereas NSX-T Autonomous edge can stretch only VLAN segments.

 

An example of how L2VPN can be implemented:

The L2 packet which was sent by the source VM reaches NSXT Edge.

Inside source NSX-T Edge, the L2 packet is vlan tagged (where VLAN ID = Tunnel ID). This vlan tagged packet is encapsulated inside GRE header and forwarded to VTI (Virtual Tunnel Interface).

The GRE encapsulated packet is encrypted and sent over the tunnel to the peer.

On the peer NSX-T Edge, this packet is decrypted and the GRE headers are removed. VLAN ID is stripped off from the packet and the original L2 packet is forwarded to the desired segment. 

Environment

  • VMware NSX
  • VMware NSX-T Datacenter
  • VMware NSX-T Autonomous Edge

Resolution

Scenario A: L2VPN Tunnel is DOWN but IPsec Session is UP

Step 1: Find the Peer GRE IP

Run the following command to get the session details:

nsx-edge> get l2vpn sessions config

Example output:

DISPLAY_NAME: L2VPN-Session-1
ENABLED: True
ID: <L2VPN-Session-UUID>
L2VPN_SERVICE_ID: <Service-UUID>
TRANSPORT_TUNNELS:
   IPSEC_VPN_SESSION_ID: <IPSEC-Session-UUID>
   VTI: <VTI-UUID>
    TUNNEL_ENCAPSULATION:
       LOCAL_ENDPOINT_IP:
           IPV4: <Local-IP>
       PEER_ENDPOINT_IP:
           IPV4: <Peer-IP>
       PROTOCOL: GRE

***Note the PEER_ENDPOINT_IP and VTI UUID****

Step 2: Check Routing Table in T0 Logical Router

Run the following command to inspect the routing table in the T0 SR Logical Router:

nsx-edge> get logical-router

Find the T0 Service Router (SR) UUID for IPSec VPN

nsx-edge> get logical-router <T0-SR-UUID> forwarding

Example output:

IPv4 Forwarding Table
IP Prefix          Gateway IP       Type      UUID             Gateway MAC
.....
<Local-IP>                         route   <Next-Hop-UUID>
<Peer-IP>                          route   <Next-Hop-UUID                    
  • Ensure the VTI UUID is the same as the Next Hop UUID for the peer GRE IP.

  • If the route is missing or incorrect, there’s a static routing issue.

 

Scenario B: L2VPN Tunnel is UP but Workloads Can't Communicate

Troubleshooting L2VPN Pipeline

  1. Tunnel ID Mismatch:
    • Ensure the Tunnel ID is the same on both client and server sides. A mismatch causes dropped packets after GRE decapsulation.
    • Check the GRE tunnel stats:
      • nsx-edge> get tunnel-port <UUID> stats
      • Look for an increment in the "No-Match" counter.
    • Remedy: Ensure the Tunnel ID is identical on both ends.
  2. Error Counters on Stretched Segment:
    • Check for error counters:
      • Get logical switch UUID: nsx-edge> get l2vpn session <L2VPN-Session-UUID> logical-switch
      • Find the matching logical-switch UUID with it's Tunnel ID (VLAN ID)
      • nsx-edge> get l2vpn session <L2VPN-Session-UUID> logical-switch <Switch-UUID> stats
    • Common errors:
      • Malformed: Packet format unrecognized.
        • Remedy: Investigate the packet format.
      • No-Match: Tunnel ID mismatch.
        • Remedy: Ensure Tunnel IDs match.
      • No-Linked-Port: GRE tunnel port is not linked.
        • Remedy: Check NSX agent logs for errors.
  3. Packet Flow Verification:
    • For egress traffic, packets should traverse: Logical Switch port → GRE tap interface → VTI interface.
    • Use the following commands to do packet captures from above interfaces/port and check if there are missing packets:
      • Logical Switch Port:
        • nsx-edge> get l2vpn session <Session UUID> logical-switch
        • Look for Switch-Port UUID for desired Tunnel ID
        • nsx-edge> start capture interface <Switch-port-uuid> direction dual
      • GRE tap interface:
        • nsx-edge> get logical-router <logical-router UUID> interfaces 
        • Look for interface UUID with Mode as "gretap"
        • nsx-edge> start capture interface <GRE-tap-interface-UUID> direction dual
      • Virtual Tunnel Interface (VTI):
        • nsx-edge> get logical-router <logical-router UUID> interfaces 
        • Look for interface UUID with Mode as "vti"
        • nsx-edge> start capture interface <VTI-interface-UUID> direction dual
      • For details on how to capture packets on NSX Edge please refer here
    • Verify if packets are seen on both GRE tap and VTI.
    • If GRE packets are seen on the tap interface but not on the VTI interface, ensure the peer GRE IP has a route in the get logical-router <T0-SR-UUID> forwarding table.

 

Troubleshooting beyond the L2VPN pipeline

A. VM Connectivity (VLAN/Overlay Stretching)

  • Ensure VMs are connected to the correct port group that is stretched over the L2VPN session.

  • Verify the correct interface (MAC/IP) of the VM is connected.

B. Verify Virtual Port Group Configurations (VLAN Stretching)

  1. Check VLAN ID Configuration:
    • Ensure correct VLAN ID is configured on access port groups for VMs.
  2. Edge Downlink Configuration:
    • The port group to which the Edge downlink is connected must be set as a trunk to support multiple VLANs.
  3. Stretched Portgroup Configuration
    • Depending on the version of ESXi host and type of virtual switch, configuring the stretched portgroup with one of the following options: 
      • Enabling Promiscuous mode
        • This method comes with greatest performance impact due to packets being sent to every VMs attached to this port group
        • Ensure "Forged Transmits" is also enabled
        • It is recommended to enable "Net.ReversePathFwdCheckPromisc" in ESXi host's advanced setting
      • Creating a Sink Port
        • Configuring a port in the virtual switch to receive any frames targeting for MAC addresses that are not known by the virtual switch
        • The below command is only available when using a VDS
        • net-dvs --enableSink 1 -p <Port ID> <VDS Name>
        • Ensure "Forged Transmits" is also enabled
      • Enabling MAC learning
        • This method has the best performance and security 
        • This option only exist in vSphere 6.7+ and applicable only when using a VDS (Not applicable to VSS)
        • Starting from vSphere 8+, MAC learning can be enabled directly on VDS through vSphere UI
        • For vSphere version below 8, to enable MAC learning please contact Broadcom Support for assistance

 

Additional Information

If a support ticket is needed with NSX Broadcom support please provide the following:

  • NSX Edge log bundles for all Edges in the Edge Cluster containing the T0 or T1 where the IPSEC VPN is configured
  • Ensure log date range covers the full date of the event(s) being investigated. When in doubt, retrieve logs for all time.
  • NSX Manager log bundles
  • ESXi host log bundles for all hosts where the affected Edge VMs are running
  • Text of any error messages seen in NSX GUI or command lines pertinent to the investigation
  • The configuration and logs from the device on the other end of the IPSEC VPN