Troubleshooting NSX-T L2 VPN
search cancel

Troubleshooting NSX-T L2 VPN

book

Article ID: 377752

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

When troubleshooting L2 VPN  there are a few things to check and consider. This articles examines the different areas to verify, validate and Troubleshoot L2 VPN.

Environment

VMware NSX

Cause

The following are the most common reasons why L2 VPN tunnels go down or stop  processing traffic:

  • Physical Link down between endpoints
  • Authentication issues
  • Firmware on remove endpoint needs upgrade/reboot
  • Problematic Edge Node

Resolution

In troubleshooting an issue the below are some are some pointers to consider:

  • L2VPN tunnel status depends on
    • Associated IPsec session status.
    • Presence of static route to peer GRE endpoint.

 

  • Verify basic connectivity between endpoints
    • Can the endpoints ping each other?
    • Is the MTU configured properly?

 

  • Remember, T1s are running inside Edge appliances:
    • Verify the health of the Edge VM and the ESXi host on which it is running
    • When collecting logs, always collect log bundles from the Edge(s) in question, their supporting host(s), and NSX-T Manager

 

  • The L2VPN session configures an IPsec VPN session automatically.
    • Go to the IPsec VPN (or L3 VPN) tab in the UI and ensure that the IPsec VPN session is up/success.

 

If the IPsec session is down, troubleshoot the IPsec session first:

    • Supported on both Tier 0 and Tier 1 GWs. However, must be active/standby HA mode for IPSEC VPN.
    • Check if new config or was working before.
    • Verify any changes,  was working?
    • At what Phase is it failing?
    • If failing at phase 1, check the IKE profiles config at both sides.
    • If failing at phase 2, check the IPSEC Profiles at both sides.

 

  • Is it Policy or Route Based VPN?
    • If Policy Based:
      • DNAT is not supported on tier-0 or tier-1 gateways where policy-based IPsec VPN are configured.
      • The local and peer networks provided in the session must be configured symmetrically at both endpoints.
      • Check the edge size and max number of tunnels supported.
    • If Route Based:
      • BGP Only.
      • Dynamic routing for VTI is not supported on VPN that is based on Tier-1 gateways.
      • Load balancer over IPSec VPN is not supported for route-based VPN terminated on Tier-1 gateways.
  • Verify IPSEC Tunnel status from the edge UI. If the status is DOWN, validate the local endpoint and the profiles.
    • Local endpoints: Validate local and remote peer IPs.
    • Profiles: Validate the configuration of the following profiles match at both sides:
    • IKE Profiles: Select the IKE version with encryption and the digest algorithm with the DiffieHellman Group.
    • IPSec Profiles: You can enable perfect forward secrecy with encryption and digest algorithm with the Diffie-Hellman Group select.
    • DPD Profiles: You can configure the Dead Peer Detection timer.

 

  • CLI Validation:
    • get ipsecvpn session summary  → Obtain the session id and review quickly the status.
    • get ipsecvpn session summary  → Obtain the session id and review quickly the status.
    • get ipsecvpn session sessionid <session_id> → Review local and remote peers and the DOWN
    • get ipsecvpn ikesa <session_id> → review the algorithms config / IPSEC Phase 1:ISAKMP
    • get ipsecvpn sad <policy_id> || get ipsecvpn sad <UUID> → Review the SPIs.
    • get ipsecvpn ipsecsa  →Review IPSEC Tunnel Phase 2
    • get ipsecvpn ipsecsa session-id <session_id>  →Review IPSEC SA info
    • get ipsecvpn tunnel stats  → Review IPSEC VPN statistics
    • get ipsecvpn config peer-endpoint  → Review IKE config

 

If the IPsec session is up and L2 VPN session has a problem, troubleshoot the L2 VPN session as follows:

 

  • L2VPN Tunnel down:
    • If the IPsec session is UP but the overall status is DOWN, you can check the presence of static route to peer GRE endpoint using following

 

  • NSXCLI commands
    • Find the GRE peer IP:
      • nsx-edge> get l2vpn sessions config
        • VTI:
        • PEER_ENDPOINT_IP: 
      • Take note of VTI interface ID and Peer endpoint IP
        • Check the routing table in T0 Logical Router
          • nsx-edge>get logical-router <T0 Service Router UUID> forwarding
        • The VTI UUID should be the next hop for the peer

 

L2vpn tunnel is UP but workload not communicating possible causes:

Tunnel ID mismatch

    • get l2vpn session
      • get tunnel-port ###-##
    •  Error counters of stretched segment
      • get l2vpn session <session UUID> logical-switch <logical-switch UUID> stats
    •  Egress traffic expected port
      • Logical switchport
        • get l2vpn session <session UUID> logical-switch
      • GRE tap interface
        • get logical-router <logical-router UUID> interfaces
      • VTI interface
        • get l2vpn sessions config

Additional Information

  • Logs
    •  Check Edge /var/log/syslog and grep for IKE and edge node: edge/ipsec-tunnel
    • If necessary (and it usually is), enabled debug logging, recreate the symptom, and pull the log bundles mentioned above.
  • Packet capture on available interfaces
    • start capture interface <uuid>

Troubleshooting based on DOWN REASON

If the Session status is seen as Negotiating, it indicates Edge has initiated IKESA request for the session, but SA negotiation is not complete.

 

Common down reasons for Session and their potential remedies are noted in the table below

 
Down Reason Meaning Remedy
Session disabled Admin has disabled the session Admin has disabled the session, enable the same
Peer not reachable Authentication , DPD timeout Check network connectivity to the peer.
Configuration Failed Configuration of session failed within IKED.

Check the configuration failed reason - can be seen using edge-appctl ike.ctl session/get. The issue is most likely an unsupported configuration

sent to IKED whereas MP allows the same. Check with IPSec team - [email protected]

Authentication Failure Edge failed to authenticate the peer, during IKE SA setup. Check for mismatch in IDs/auth credentials (pre shared key value/certificate). For remote peers behind NAT, may want to validate: remote peer local id = NSX-T → VPN → Advanced Tunnel Parameters → Remote Private IP
Negotiation not started IKE negotiation was not started for this session.

Check if the Session is configured as Responder or On-Demand.

If session is configured as Responder, IKE negotiation needs to be started from peer side only.

If session is configured as On-Demand, datapath shall trigger SA negotiation on receipt of packets matching the IPSec Policy (and on the condition that there is no SA). Initiate ping for traffic matching Outbound rules of IPsec corresponding to the session, to see it negotiation starts.

SR state is not Active IKED identifies that SR is not in ACTIVE state. Sessions are not realized unless SR is in Active state. If HA status for SR is not Active, fix the problem in HA. If as per HA SR is in Active state but IKED still reports this down reason, it is most likely a bug in IKED.
TS unacceptable IPSec SA setup has failed due to mismatch in policy rule definition, between the gateways for tunnel configuration. Check local and remote network configuration on both gateways.
Peer not responding No response received from peer for requests sent to establish IKE SA. DPD timeout.

If peer is actually UP, this is most likely an issue with routing (either on Edge or on network connected to Uplink).

Ping from Tier-0 SR VRF context to the peer gateway IP to check connectivity.

If not working, check for route entry to reach peer gateway (either using default route from uplink interface OR using peer gateway network prefix reachable over uplink)

If ping is working, IKE packets may be reaching the peer but the peer may not be responding due to misconfiguration of ipsec. Check for VPN configuration at the peer gateway. Also check for any firewall/NAT between Edge and Peer Gateway - this may require changes in configuration at Edge.

Peer sent delete Peer has deleted IKESA and sent message to Edge to Delete SA. Check why did the peer sent Delete. In most such cases, Edge would not be configured to initiate tunnel and therefore Edge is waiting for tunnel to be initiated from peer side.
No proposal chosen

Peer responded with "No proposal chosen" failure message in response to request sent from Edge

Algorithms in phase 1/2 are not consistent in both local as well as peer configuration.

Check if crypto algorithms configured under IKE profile associated to the Session matches the configuration from peer.
IPSec service not active Status of VPN service used for the session is not active. Check for IPsec service admin status.
Session disabled Admin has disabled the session Admin has disabled the session, enable the same
Peer not reachable Authentication , DPD timeout Check network connectivity to the peer.
Configuration Failed Configuration of session failed within IKED.

Check the configuration failed reason - can be seen using edge-appctl ike.ctl session/get. The issue is most likely an unsupported configuration

sent to IKED whereas MP allows the same. Check with IPSec team - [email protected]

Authentication Failure Edge failed to authenticate the peer, during IKE SA setup. Check for mismatch in IDs/auth credentials (pre shared key value/certificate). For remote peers behind NAT, may want to validate: remote peer local id = NSX-T → VPN → Advanced Tunnel Parameters → Remote Private IP
Negotiation not started IKE negotiation was not started for this session.

Check if the Session is configured as Responder or On-Demand.

If session is configured as Responder, IKE negotiation needs to be started from peer side only.

If session is configured as On-Demand, datapath shall trigger SA negotiation on receipt of packets matching the IPSec Policy (and on the condition that there is no SA). Initiate ping for traffic matching Outbound rules of IPsec corresponding ti the session, to see it negotiation starts.

SR state is not Active IKED identifies that SR is not in ACTIVE state. Sessions are not realized unless SR is in Active state. If HA status for SR is not Active, fix the problem in HA. If as per HA SR is in Active state but IKED still reports this down reason, it is most likely a bug in IKED.
TS unacceptable IPSec SA setup has failed due to mismatch in policy rule definition, between the gateways for tunnel configuration. Check local and remote network configuration on both gateways.
Peer not responding No response received from peer for requests sent to establish IKE SA. DPD timeout.

If peer is actually UP, this is most likely an issue with routing (either on Edge or on network connected to Uplink).

Ping from Tier-0 SR VRF context to the peer gateway IP to check connectivity.

If not working, check for route entry to reach peer gateway (either using default route from uplink interface OR using peer gateway network prefix reachable over uplink)

If ping is working, IKE packets may be reaching the peer but the peer may not be responding due to misconfiguration of ipsec. Check for VPN configuration at the peer gateway. Also check for any firewall/NAT between Edge and Peer Gateway - this may require changes in configuration at Edge.

Peer sent delete Peer has deleted IKESA and sent message to Edge to Delete SA. Check why did the peer sent Delete. In most such cases, Edge would not be configured to initiate tunnel and therefore Edge is waiting for tunnel to be initiated from peer side.
No proposal chosen

Peer responded with "No proposal chosen" failure message in response to request sent from Edge

Algorithms in phase 1/2 are not consistent in both local as well as peer configuration.

Check if crypto algorithms configured under IKE profile associated to the Session matches the configuration from peer.
IPSec service not active Status of VPN service used for the session is not active. Check for IPsec service admin status.

 

Common down reasons for Tunnel and their potential remedies are noted in the table below

Down Reason
Meaning
Remedy
IKE SA Down IKE Session corresponding to this Policy rule is Down. Hence the Tunnel is down. Troubleshoot reason for Session being in Down state.
No Proposal chosen Crypto algorithms configured for IPSec SA do not match that in peer Check for configuration of algorithms in tunnel profile associated to session, with the corresponding configuration at the peer
Selector Mismatch IPSec SA negotiation failed because of mismatch in Policy rules configured at Edge and corresponding configuration at Peer Gateway. Check for matching subnets in both the Gateways.
Negotiation not started IPSec SA negotiation was not started for this session. Either IKESA is not established OR there is no traffic matching the IPSec SP.
Peer sent delete Peer has deleted IPSec SA and sent message to Edge to Delete SA. Check why did the peer sent Delete. In most such cases, Edge would not be configured to initiate tunnel and therefore Edge is waiting for tunnel to be initiated from peer side.
Phase-1 failed Phase 1 negotiation as failed.  
No IKE peers All IKE peers are dead, No peer left to try the connection Check peer connectivity, whether it is up.

If you are contacting Broadcom support about this issue, please provide the following:

  • State of the VPN connection reported on peer device
  • Are you able to ping the peer device 
  • How long as the session reported down/Has this ever worked
  • State of the physical network.

Handling Log Bundles for offline review with Broadcom support