VTEP faulty Alarm
search cancel

VTEP faulty Alarm

book

Article ID: 322448

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

Title: Alarm for Faulty VTEP.
Event ID: tep_health.faulty_tep
Added in release: 4.1.0/ M22
Alarm Description

  • Purpose: Faulty VTEP due to lack of IP assigned to VTEP or all BFD sessions from the VTEP are down
  • Impact: Overlay VMs using this local VTEP would face network outage.

Environment

VMware NSX

Resolution

  1. If all BFD sessions from a local VTEP are down: 
    • Check the underlay configuration for packet forwarding issues at the Top of Rack (TOR) and all of the next-hops involved in routing in the underlay. 
       
  2. If the local VTEP has no IP and provisioning type selected is DHCP for local VTEP: 
    • Confirm that DHCP server configuration is correct and pool exhaustion is not seen at DHCP server.
    • Confirm that TEP VLAN is configured on the physical switch.
      • Login to the ESXi host via a SSH session
      • Get the vmk number which is having issues with obtaining the DHCP IP address: 
        esxcfg-vmknic -l
      • Get the assigned vmnic number through esxtop:
        type esxtop and click on n for networking, then check which vmnic is assigned to vmk in question.
      • Capture one packet on the vmnic to confirm that VLAN exists in the trunk of the physical switch: 
        pktcap-uw --uplink vmnic# --vlan <TEP VLAN ID> -c 1
      • If a packet is captured, this means that physical switch is forwarding packets on the VLAN in question, if not, then the physical switch is not forwarding packets on the VLAN in question and the network team needs to investigate the configuration of the switchport.

  3. Check for physical NIC firmware issues and upgrade PNIC firmware to the latest version.
       
  4.  After correcting any underlay issues, verify the local VTEP state with the below API:

    GET: https://'nsx-manager-ip'/api/v1/transport-nodes/'node-id'/network/interfaces?source=realtime

    It should show the local VTEP state as NORMAL.
    sample output:
    {
       interfaceId: vmk10,
       linkStatus: UP,
       adminStatus: UP,
       mtu: 1600,
       interfaceAlias: [{
          broadcastAddress: 133.###.###.255,
          ipAddress: {
        ipv4: 22########
          },
       ipConfiguration: STATIC,
       netmask: 255.255.255.0,
     macAddress: 00:50:56:##:##:##
    }],
    state: NORMAL
    }
  • Is there a way to Work Around: Enable VTEP HA feature to failover VMs to healthy VTEP.  
  • Maintenance window required for remediation? Yes  
  • API reference: API Reference 

Additional Information

API Guide: API Guide
Admin Guide: Admin Guide