TEP tunnels are down between ESXi Transport Node and Edge Transport Node. (Alarm: Event type: Faulty TEP)
book
Article ID: 405174
calendar_today
Updated On:
Products
VMware NSX
Issue/Introduction
When workloads are moved to ESXi Transport Node the TEP (Tunnel Endpoint) status shows Tunnels towards Edge as down but to other ESXi transport node TEP tunnel works fine (Host and Edge TEPs are having different VLAN networks)
TEP tunnel failures between ESXi hosts and Edge nodes
NSX Manager displays "Faulty TEP" alarms:
Host-to-Edge TEP tunnels fail while Host-to-Host tunnels remain operational
Validation Steps:
Navigate to NSX Manager UI → Fabric → Hosts
Validate the TEP tunnel status showing tunnels down on affected host
Test connectivity from ESXi Transport node TEP towards Edge TEP using the following command and the test fails:
Here -S is the network stack of the VTEP interfaces which is vxlan by default
-s is the MTU size. Kindly use 1490 for MTU set to 1500 and 8970 for MTU set to 9000
-I is the TEP VMkernel adapter that is assigned to the vxlan network stack. Run esxcfg-vmknic -l to get the corresponding vmkernel adapter vmkX assigned as TEP interfaces for the ESXi host.
Include -d to disable the de-fragmentation.
Further, a packet capture can be performed while the vmkping is running at source Host where the ESXi TEP ICMP echo requests are sent via related uplinks and that traffic should be received at the Edge side.
To verify NIC adapter the corresponding vmkernel TEP interface is on, run the command nsxdp-cli vswitch instance list from the source ESXi host logged in as root user.
Simultaneously, login as root on the destination ESXi host where the Edge is deployed and run the the packet capture command on the NIC adapter to which the Edge TEP IP is currently running on
To verify NIC adapter the corresponding to the EDGE TEP interface follow the steps below
Login to the Edge Node as user admin
Run get gateway
Check the Type as TUNNEL :- Here, check the VRF ID
Run vrf # --> Include the VRF ID here
Run the command get interfaces :- Make a note of the IP, MAC and Interface ID
Login to the ESXi host as user root
If the ESXi host is a part of the NSX cluster, then run the command nsxdp-cli vswitch instance list to validate the Edge interface MAC and its corresponding VMNIC
If the ESXi host is NOT a part of the NSX cluster, then run the command esxcli network vm list
Make a note of the Edge VM name and the world ID it is associated with. Run the command esxcli network vm port list -w <Edge-VM-World-ID>
Verify the MAC address and corresponding VMNIC that is associated with the TEP IP interface.
Run the command below on the VMNIC to verify whether the incoming traffic is detected on the VMNIC of the destination ESXi host where the Edge node is deployed.
In this scenario, when vmkping is running at source Host, the ICMP echo request traffic is not received at the other Host's uplinks where Edge VM is present indicating the packet is lost in physical environment.
Environment
VMware NSX
Cause
The issue occurs due to Layer 3 inter-VLAN routing failure in the physical network infrastructure between Host TEP and Edge TEP networks.
When Host and Edge TEPs are configured on separate VLANs (recommended practice), proper inter-VLAN routing must be established in the physical network.
Resolution
Configure Inter-VLAN Routing in the physical environment
Fix the inter-VLAN routing between Host TEP and Edge TEP networks on the physical network infrastructure where VLAN gateways are configured.
Steps:
Identify Network Infrastructure:
Locate the physical switch/router handling inter-VLAN routing
Confirm VLAN assignments for Host TEP and Edge TEP networks
Configure Routing:
Add routing entries between Host TEP VLAN and Edge TEP VLAN
Verify gateway configurations for both networks
Test connectivity using physical network diagnostic tools
Validate Resolution:
Execute vmkping tests from affected ESXi hosts: vmkping -S vxlan -s 1572 -I vmk10 -c 100 -d <Edge TEP IP> (Test it with both lower MTU and Jumbo MTU)
Monitor TEP tunnel status in NSX Manager UI to see if it comes up after the routing issue is resolved
Expected Result: After implementing the routing fix, TEP tunnels should successfully establish between Host TEP and Edge TEPs when workloads are placed on affected hosts