VTEP HA activated alarm
search cancel

VTEP HA activated alarm

book

Article ID: 330568

calendar_today

Updated On:

Products

VMware NSX VMware NSX-T Data Center

Issue/Introduction

Title: Alarm for VTEP HA Activated.
Event ID: tep_health.tep_ha_activated

Alarm Description

  • Purpose: VTEP HA was activated on faulty VTEP and overlay workloads from faulty VTEP was migrated to healthy VTEP if at least one healthy VTEP was available. If no healthy VTEPs were available overlay workloads would not be failed over and would face network outage.
  • Impact: Overlay workloads have failed over to healthy VTEP if healthy VTEP was available and overlay traffic outage is resolved.

In the affected host's /var/log/nsx-syslog.log, we see messages similar to the below:

Wa(180) cfgAgent[2147528]: NSX 2147528 - [nsx@6876 comp="nsx-controller" subcomp="cfgAgent" s2comp="nsx-monitoring" entId="########-####-####-####-########4d11" tid="F3DF7700" level="warn" eventState="On" eventFeatureName="tep_health" eventSev="warning" eventType="faulty_tep"] TEP:vmk11 of VDS:vDS-Name at Transport node:########-####-####-####-########4d89. Overlay workloads using this TEP will face network outage. Reason: all BFD tunnels from TEP are down.

Environment

VMware NSX
VMware NSX-T Data Center

Resolution

Steps to resolve
For 4.1.0 and higher

Recommended Action:

If faulty_tep alarm shows reason for failure as:

  • "all BFD sessions from a local VTEP is down":
    • Check underlay configuration for packet forwarding issues at TOR and all of the next-hops involved in routing in underlay.
  • If local VTEP has no IP:
    • if provisioning type selected is dhcp for local VTEP, check dhcp server configuration is proper and pool exhaustion is not seen at dhcp server. Check for pnic firmware issues and upgrade pnic firmware to latest version.

After fixing the underlay issue check for local VTEP state by below api once manual or auto recovery is done for 'bfd down' case:

GET: https://<nsx-manager-ip>/api/v1/transport-nodes/<node-id>/network/interfaces?source=realtime

Note: You should see local VTEP state as NORMAL.
Sample output:

{ 
interfaceId: vmk10,
linkStatus: UP,
adminStatus: UP,
mtu: 1600,
interfaceAlias: [{
broadcastAddress: 192.168.1.255,
ipAddress: {
ipv4: 2239043120
},
ipConfiguration: STATIC,
netmask: 255.255.255.0,
macAddress: 00:50:##:##:##:a6
}],
state: NORMAL
}

Maintenance window required for remediation? Yes 

Additional Information