VTEP faulty Alarm
search cancel

VTEP faulty Alarm

book

Article ID: 322448

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

Title: Alarm for Faulty VTEP.
Event ID: tep_health.faulty_tep
Added in release: 4.1.0/ M22
Alarm Description

  • Purpose: Faulty VTEP due to lack of IP assigned to VTEP or all BFD sessions from the VTEP are down
  • Impact: Overlay VMs using this local VTEP would face network outage.

Environment

VMware NSX

Resolution

  1. If all BFD sessions from a local VTEP are down: 
    • Check underlay configuration for packet forwarding issues at Top of Rack (TOR) and all of the next-hops involved in routing in underlay. 
       
  2. If local VTEP has no IP and provisioning type selected is DHCP for local VTEP: 
    • Confirm that DHCP server configuration is proper and pool exhaustion is not seen at DHCP server.
    • Confirm that TEP VLAN is configured on the physical switch.
      • login to ESXi host over SSH session
      • Get the vmk number which is having issues with obtaining DHCP IP address: 
        esxcfg-vmknic -l
      • Get the assigned vmnic number through esxtop:
        type esxtop and click on n for networking, then check which vmnic is assigned to vmk in question.
      • Capture 1 packet on the vmnic to confirm that VLAN exists in the trunk of the physical switch: 
        pktcap-uw --uplink vmnic# --vlan <TEP VLAN ID> -c 1
      • If packet is captures, this means that physical switch is forwarding packets on VLAN in question, if not, then physical switch is not forwarding packets on the VLAN in question and network team needs to investigate the config of the switchport.

  3. Check for physical NIC firmware issues and upgrade PNIC firmware to latest version.
       
  4.  After correcting any underlay issues, verify local VTEP state with the below API:

    GET: https://'nsx-manager-ip'/api/v1/transport-nodes/'node-id'/network/interfaces?source=realtime

    It should show local VTEP state as NORMAL.
    sample output:
    {
       interfaceId: vmk10,
       linkStatus: UP,
       adminStatus: UP,
       mtu: 1600,
       interfaceAlias: [{
          broadcastAddress: 133.###.###.255,
          ipAddress: {
          ipv4: 2239043120
          },
       ipConfiguration: STATIC,
       netmask: 255.255.255.0,
       macAddress: 00:50:56:66:67:a6
    }],
    state: NORMAL
    }
  • Is there a way to Work Around: Enable VTEP HA feature to failover VMs to healthy VTEP.  
  • Maintenance window required for remediation? Yes  
  • API reference: API Reference 

Additional Information

API Guide: API Guide
Admin Guide: Admin Guide