Troubleshooting NSX IPSEC VPN
search cancel

Troubleshooting NSX IPSEC VPN

book

Article ID: 379731

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

This article provides troubleshooting details for IPsec VPN on VMware NSX-T. IPsec VPN services are critical for establishing secure connectivity between NSX-T environments and remote sites.

Functional Pre-requisite Requirements:

  • High Availability (HA) Mode: The Gateway must be configured in Active/Standby HA mode. Active/Active is not supported for IPsec VPN services.

Environment

VMware NSX

Resolution

Supported Modes

Policy Based IPsec VPN (PBVPN)

Policy Based VPNs tunnel traffic based on configured local and remote networks.

  • Configuration: Requires defining specific Local Networks and Remote NetworksEach combination of local and remote networks form a policy for VPN.
  • Troubleshooting Focus: Ensure Local and Remote Networks configuration are mirrored on the Remote VPN gateway. Mismatches here are the most common cause of Phase 2 failures.

Route Based IPsec VPN (RBVPN)

Route Based VPNs make use of forwarding table to identify traffic to be sent through the IPsec tunnel. The forwarding entry could be based on static routes or dynamically learnt over BGP.

  • Configuration: Any forwarding entry configured for Virtual Tunnel Interface (VTI) will make use of RBVPN.

  • Troubleshooting Focus: Check VTI IP connectivity and BGP neighbor states (if dynamic routing is configured). 

    To check VTI IP connectivity, ping to Remote VTI IP can be checked as shown as follows:

    [nsx-edge(tier0_sr[1])> ping X.X.X.X source Y.Y.Y.Y

PING X.X.X.X (169.2.2.3) from Y.Y.Y.Y: 56 data bytes
64 bytes from X.X.X.X: icmp_seq=0 ttl=64 time=5.146 ms
64 bytes from X.X.X.X: icmp_seq=1 ttl=64 time=3.964 ms
64 bytes from X.X.X.X: icmp_seq=2 ttl=64 time=3.747 ms
64 bytes from X.X.X.X: icmp_seq=3 ttl=64 time=4.235 ms
64 bytes from X.X.X.X: icmp_seq=4 ttl=64 time=3.692 ms
^C
--- X.X.X.X ping statistics ---
6 packets transmitted, 5 packets received, 16.7% packet loss
round-trip min/avg/max/stddev = 3.692/4.157/5.146/0.530 ms

nsx-edge(tier0_sr[1])>

To check BGP neighbor states:

[nsx-edge> get gateways
Gateway
UUID                                 VRF    Gateway-ID  Name                Type                          Ports  Neighbors
736a80e3-####-####-####-bb########## 0      0                               TUNNEL                        3      2/5000
1999abc1-####-####-####-9b########## 1      4           SR-T0-Server-A      SERVICE_ROUTER_TIER0          9      1/50000
1bd3cb82-####-####-####-82########## 3      2           DR-T0-Server-A      DISTRIBUTED_ROUTER_TIER0      6      2/50000
1298f615-####-####-####-11########## 5      9           SR-T1-Server-B      SERVICE_ROUTER_TIER1          7      2/50000
d955c6ca-####-####-####-2c########## 6      8           DR-T1-Server-B      DISTRIBUTED_ROUTER_TIER1      4      0/50000
59ad774a-####-####-####-6e########## 7      16          SR-VRF-test_vrf     VRF_SERVICE_ROUTER_TIER0      4      0/50000
b08bac8e-####-####-####-28########## 8      14          DR-VRF-test_vrf     VRF_DISTRIBUTED_ROUTER_TIER0  3      0/50000

[nsx-edge> vrf 1
[nsx-edge(tier0_sr[1])> get bgp neighbor summary
BFD States: NC - Not configured, DC - Disconnected
            DW - Down, IN - Init, UP - Up
BGP Peer Type: * - Dynamic
BGP summary information for VRF default for address-family: ipv4Unicast
Router ID: A.A.A.A  Local AS: ####

Neighbor            AS          State Up/DownTime  BFD InMsgs  OutMsgs InPfx  OutPfx

B.B.B.B             ####        Estab 4d06h46m     NC  531919  532064  5      0
C.C.C.C             ####        Estab 00:01:26     NC  366     454     7      10

[nsx-edge(tier0_sr[1])>

Troubleshooting Scenarios

VPN session has Download Config feature which can help basic configuration related troubleshooting. It effectively generates a "cheat sheet" of exactly what the remote device must be configured with; to successfully bring up the VPN tunnel with NSX-T Edge.

This is helpful particularly for following down reasons:

  • No proposal chosen
  • Config mismatch
  • TS unacceptable
  • Authentication failed

Accessing the Feature:

  1. Navigate to Networking > VPN > IPsec Sessions.

  2. Select the specific VPN Session.

  3. Click the Download Config button.

  4. This will download a text file containing the configuration parameters for Peer Device.

 

Here are some specific scenarios along with checklist details to help troubleshoot. 

Session Status as Down

If the session status is "Down," the IKE (Phase 1) negotiation has failed completely. See screenshot below:

Checklist for specific down reasons:

  1. Peer not responding:

           Verify basic reachability (ping) between the Local Endpoint IP and Remote Gateway IP. Ensure UDP ports 500 (IKE), 4500 (NAT-T) and  IP Proto 50 (ESP) are open in the underlay (physical) firewalls.

  2. No proposal chosen / config mismatch:

      Verify that the Pre-Shared Key (PSK), IKE Version (v1/v2), Encryption, Digest, and Diffie-Hellman (DH) groups match exactly on Remote VPN Gateway.

  3. Authentication failed:

      If using NAT-T or certificates, ensure the Local ID and Remote ID settings identify the peers correctly.

      For PSK based authentication, ensure PSK exactly matches on both the sides.

  4. TS unacceptable:

      Check the local and remote network configuration on both sides in case of PBVPN.

Session Status as Degraded

"Degraded" implies that the session is UP but one or more tunnels are in "Down" state affecting particular selective traffic flows.

Checklist for specific down reasons:

  1. No proposal chosen: Verify that Encryption, Digest, and Diffie-Hellman (DH) groups match exactly on both sides.
  2. TS unacceptable: Check the local and remote network configuration on both sides in case of PBVPN.
  3. Invalid syntax / Invalid spi: These down reasons suggest, an invalid, unexpected or BAD payload has been received

Session is UP but traffic through the tunnel not working

The tunnel appears "Success/UP" in the UI, but data cannot pass.

Checklist:

  1. Firewall Rules:

    • Check Gateway Firewall rules configurations. Ensure traffic is allowed In/Out.

  2. Routing (RBVPN):

    • Verify routes exist over VTI for desired destination IPs.

    • Either static routes or BGP should be configured over VTI

    • Following Edge CLIs can be used in vrf context to get routing details:
      • get route

      • get bgp neighbor summary

    • Ensure the next hop for the destination is pointing to the VTI interface from the output of the get route command in the previous step.

3. MTU/MSS:

    • Large packets might be dropped due to fragmentation needed. Check the MTU on the uplinks and PMTU..

    • Check tunnel traffic counters for any relevant drops.
    • Try clamping TCP MSS on the VPN profile (e.g., 1350 bytes).

Intermittent Failures

The tunnel comes up, stays for a while, and then drops or restarts.

Checklist:

  1. Traffic through the tunnel
    1.  There are some devices, which tear down the session if there is no traffic flowing through the tunnel.
    2. Check tunnel counters details using CLI "get ipsecvpn tunnel stats"
  2. NAT configuration

    1. Check if any NAT rules are configured for subnets over VPN.
  3. HA VIP as IPsec Local Endpoint(LEP) IP

    1. If HA VIP is used as LEP, then there may be a known issue.

 

Useful CLIs

Note: These commands must be run on the NSX Edge Node where the VPN service is active.

1. General VPN Status

# Check IPsec VPN service and session status details

get ipsecvpn service

get ipsecvpn session <options>

2. Configuration Details

# Check configuration pushed to the Edge

get ipsecvpn config <options>

3. IKE (Phase 1) Diagnostics

# Check IKE Security Associations (SAs)

get ipsecvpn ikesa <options>

4. IPsec (Phase 2) Diagnostics

# Check IPsec SAs (Tunnel status) in control plane

get ipsecvpn ipsecsa <options>

5. Packet counters

# Check Packet Counters (verify if traffic is hitting the tunnel) 

get ipsecvpn tunnel stats

6. IPSEC SAs

# Check IPsec SAs in Datapath 

get ipsecvpn sad

Logs/Alarms

The primary logs for IPsec VPN troubleshooting are located on the NSX Edge Node.

  • Main Log File:

    • /var/log/syslog 

  • Parsing Tips:

    • /opt/vmware/nsx-opsagent/bin/syslog_filter.sh iked → For VPN logs
    • /opt/vmware/nsx-opsagent/bin/syslog_filter.sh datapathd → For Datapath logs
  • VPN Alarms: 
    • There are following VPN related alarms
      • IPsec Policy Based Session Down → This alarm indicates that particular PBVPN session is down along with details regarding the "Down Reason".
      • IPsec Policy Based Tunnel Down → This alarm indicates that one or more tunnels of a particular session are down.
      • IPsec Route Based Session Down → This alarm indicates that particular RBVPN session is down along with details regarding the "Down Reason".
      • IPsec Route Based Tunnel Down → This alarm indicates that tunnel of a particular RBVPN session is down along with details regarding the "Down Reason".
    • Above alarms details can be check on Alarms dashboard on UI.
    • When such a VPN alarm is raised, An entry can be seen on Alarms dashboard with details regarding the Edge Node reporting the Alarm.
    • Only Active Edge Node reports VPN alarm events.

Interaction of Firewall and NAT with VPN

This section outlines how Firewall and NAT configurations interact with both Policy-Based and Route-Based IPsec VPN implementations.

1. Firewall Configuration

General Traffic Processing

If a Gateway Firewall is enabled on the gateway hosting the VPN configuration, packets will only be processed for IPsec VPN if the firewall policy explicitly permits them.

Rules for VPN Traffic Types

  • Policy-Based VPN: Ensure that firewall policies do not drop IP packets that match the Local and Remote Network definitions of the IPsec VPN session.

  • Route-Based VPN: Ensure that traffic destined for the Remote Network is permitted. This applies when a firewall policy is active over the Route-Based VPN interface (VTI).

Rules for IPsec Control Packets (IKE)

For an IPsec VPN to establish, the gateway firewall must allow specific control packets between the Local Endpoint IP (Source) and the Remote Endpoint IP (Destination), and vice versa.

Required Protocols: UDP Port 500 and UDP Port 4500 (for IKE and NAT-T).

Provisioning Behavior:

  • Local Gateway: If the Gateway Firewall is enabled, the policy to allow IKE traffic is auto-provisioned when the IPsec VPN session is created.
  • Intermediate Nodes: If a firewall exists on any transit network node or gateway between the VPN endpoints, you must manually ensure that policies allow UDP 500/4500 traffic.

2. NAT Configuration

NAT Rules for Subnets over VPN

Policy-Based VPN

  • Default Priority (VPN over NAT): By default, VPN processing takes precedence over NAT. If outbound traffic matches both an SNAT rule and an IPsec Local/Remote network pair, the NAT rule is bypassed, and the traffic is encapsulated for the VPN.

  • Configuring NAT Before VPN: If your architecture requires address translation before traffic enters the IPsec tunnel, you must configure the IPsec Local networks using the post-NAT (translated) IP addresses, rather than the original source IPs.

Route-Based VPN

  • VTI Routing: Traffic routed to a VTI (via static routes or BGP) is automatically processed for IPsec encapsulation.

  • VTI SNAT: If SNAT is applied to the VTI, BGP advertises the translated IP to the peer, not the original source IP.

NAT Rules for IPsec Endpoint IPs (NAT Traversal)

IPsec VPN supports NAT Traversal (NAT-T), allowing VPN endpoints to communicate even if one or both reside behind a NAT device.

  • NAT-T Compatibility: Supported on upstream nodes (e.g. an upstream Tier-0 Gateway).

  • Limitation: NAT cannot be applied to VPN Endpoint IPs on the gateway terminating the IPsec session (e.g., a Tier-1 Gateway).

  • Workaround: Create a high-priority "No NAT" rule on the terminating gateway (Source = Local Endpoint IP, Destination = Peer IP).

  • Roadmap: Future release will fully automate this "No NAT" rule creation and lifecycle management.

Known Issues / Limitations:

 



Additional Information

If you are contacting Broadcom support about this issue, please provide the following:

  • NSX Edge log bundles for all Edges in the Edge Cluster containing the T0 or T1 where the IPSEC VPN is configured

  • Ensure log date range covers the full date of the event(s) being investigated. When in doubt, retrieve logs for all time.

  • NSX Manager log bundles

  • ESXi host log bundles for all hosts where the affected Edge VMs are running

  • Text of any error messages seen in NSX GUI or command lines pertinent to the investigation

  • The configuration and logs from the device on the other end of the IPSEC VPN