In a new deployment, Host and Edge (TEPs) tunnels intermittently drop or enter a "Down" state. This behavior is specifically observed when new Avi Service Engines (SEs) are created and resulting in a loss of connectivity for all tunnels on affected ESXi hosts.
Symptoms include:
ESXi TEPs are unable to ping their own Default Gateway.
External Connections cannot reach the Host TEP Gateway.
Tunnels toward Edge Nodes fail due to a lack of L3 reachability.
VMware NSX , VCF 9.x
The issue is caused by a physical network misconfiguration where required Layer 3 (L3) VLANs for the TEP networks are missing from the trunk port configuration on the physical switches connected to the ESXi hosts. This prevents the TEPs from communicating with their gateways, leading to tunnel failures when new network load (such as Avi SE creation) occurs.
To resolve this issue, you must ensure the physical network infrastructure correctly supports the TEP VLANs:
Verify Trunk Configuration: Coordinate with your Network Engineering team to ensure that all VLANs associated with the TEP IP pools are explicitly added to the trunk configuration of the physical switch ports connected to the ESXi hosts.
Enable L3 Routing: Ensure that routing is correctly enabled between the different TEP and Edge segments on the physical network.
Verify Connectivity: From the ESXi host CLI, verify that the TEP can ping its local gateway:
vmkping -I vmkX <Gateway_IP> -S vxlan
(Note: vmkX is an example; use the specific VMkernel interface assigned to your TEP).
No host reboots are required; connectivity will restore automatically once the network path is open.