TEP tunnels are down between ESXi Transport Node and Edge Transport Node. (Alarm: Event type: Faulty TEP)
search cancel

TEP tunnels are down between ESXi Transport Node and Edge Transport Node. (Alarm: Event type: Faulty TEP)

book

Article ID: 405174

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

When workloads are moved to ESXi Transport Node the TEP (Tunnel Endpoint) status shows Tunnels towards Edge as down but to other ESXi transport node TEP tunnel works fine (Host and Edge TEPs are having different VLAN networks)

  • TEP tunnel failures between ESXi hosts and Edge nodes 

  • NSX Manager displays "Faulty TEP" alarms:

  • Host-to-Edge TEP tunnels fail while Host-to-Host tunnels remain operational

Validation Steps:

  1. Navigate to NSX Manager UI → Fabric → Hosts
  2. Validate the TEP tunnel status showing tunnels down on affected host
  3. Test connectivity from ESXi Transport node TEP towards Edge TEP using the following command and the test fails:
    • Login to the affected host as user root
    • Run the command below

vmkping -S vxlan -s 1490 -I vmkX -c 100 <Edge TEP IP>

    • Here -S is the network stack of the VTEP interfaces which is vxlan by default
    • -s is the MTU size. Kindly use 1490 for MTU set to 1500 and 8970 for MTU set to 9000
    • -I is the TEP VMkernel adapter that is assigned to the vxlan network stack. Run esxcfg-vmknic -l to get the corresponding vmkernel adapter vmkX assigned as TEP interfaces for the ESXi host.
    • Include -d to disable the de-fragmentation.

 

  • Further, a packet capture can be performed while the vmkping is running at source Host where the ESXi TEP ICMP echo requests are sent via related uplinks and that traffic should be received at the Edge side.
    • To verify NIC adapter the corresponding vmkernel TEP interface is on, run the command nsxdp-cli vswitch instance list from the source ESXi host logged in as root user.

pktcap-uw --uplink <vmnic#> --capture UplinkSndKernel,UplinkRcvKernel -o - | tcpdump-uw -enr - | grep <Edge-TEP-IP>

  • Simultaneously, login as root on the destination ESXi host where the Edge is deployed and run the the packet capture command on the NIC adapter to which the Edge TEP IP is currently running on
    • To verify NIC adapter the corresponding to the EDGE TEP interface follow the steps below
      • Login to the Edge Node as user admin
      • Run get gateway
      • Check the Type as TUNNEL  :- Here, check the VRF ID
      • Run vrf #  --> Include the VRF ID here 
      • Run the command get interfaces  :- Make a note of the IP, MAC and Interface ID
    • Login to the ESXi host as user root
    • If the ESXi host is a part of the NSX cluster, then run the command nsxdp-cli vswitch instance list to validate the Edge interface MAC and its corresponding VMNIC
    • If the ESXi host is NOT a part of the NSX cluster, then run the command esxcli network vm list
      • Make a note of the Edge VM name and the world ID it is associated with. Run the command esxcli network vm port list -w <Edge-VM-World-ID>
      • Verify the MAC address and corresponding VMNIC that is associated with the TEP IP interface.
    • Run the command below on the VMNIC to verify whether the incoming traffic is detected on the VMNIC of the destination ESXi host where the Edge node is deployed.

pktcap-uw --uplink <vmnic#> --capture UplinkSndKernel,UplinkRcvKernel -o - | tcpdump-uw -enr - | grep <Source-ESXi-TEP-IP>

 

In this scenario, when vmkping is running at source Host, the ICMP echo request traffic is not received at the other Host's uplinks where Edge VM is present indicating the packet is lost in physical environment.

Environment

VMware NSX

Cause

  • The issue occurs due to Layer 3 inter-VLAN routing failure in the physical network infrastructure between Host TEP and Edge TEP networks.
  • When Host and Edge TEPs are configured on separate VLANs (recommended practice), proper inter-VLAN routing must be established in the physical network. 

 

Resolution

Configure Inter-VLAN Routing in the physical environment

Fix the inter-VLAN routing between Host TEP and Edge TEP networks on the physical network infrastructure where VLAN gateways are configured.

Steps:

  1. Identify Network Infrastructure:
    • Locate the physical switch/router handling inter-VLAN routing
    • Confirm VLAN assignments for Host TEP and Edge TEP networks
  2. Configure Routing:
    • Add routing entries between Host TEP VLAN and Edge TEP VLAN
    • Verify gateway configurations for both networks
    • Test connectivity using physical network diagnostic tools
  3. Validate Resolution:
    • Execute vmkping tests from affected ESXi hosts: vmkping -S vxlan -s 1572 -I vmk10 -c 100 -d <Edge TEP IP> (Test it with both lower MTU and Jumbo MTU)
    • Monitor TEP tunnel status in NSX Manager UI to see if it comes up after the routing issue is resolved

Expected Result: After implementing the routing fix, TEP tunnels should successfully establish between Host TEP and Edge TEPs when workloads are placed on affected hosts

Additional Information