VTEP HA activated alarm
search cancel

VTEP HA activated alarm

book

Article ID: 330568

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

Title: Alarm for VTEP HA Activated.
Event ID: tep_health.tep_ha_activated

Alarm Description

  • Purpose: VTEP HA was activated on faulty VTEP and overlay workloads from faulty VTEP was migrated to healthy VTEP if atleast one healthy VTEP was available. If no healthy VTEPs were available overlay workloads would not be failedover and would face network outage.
  • Impact: Overlay workloads have failedover to healthy VTEP if healthy VTEP was available and overlay traffic outage is resolved.

Environment

VMware NSX

Resolution

Steps to resolve
For 4.1.0 and higher

Recommended Action:

If faulty_tep alarm shows reason for failure as:

  • "all BFD sessions from a local VTEP is down":
    • Check underlay configuration for packet forwarding issues at TOR and all of the next-hops involved in routing in underlay.
  • If local VTEP has no IP:
    • if provisioning type selected is dhcp for local VTEP, check dhcp server configuration is proper and pool exhaustion is not seen at dhcp server. Check for pnic firmware issues and upgrade pnic firmware to latest version.

After fixing the underlay issue check for local VTEP state by below api once manual or auto recovery is done for 'bfd down' case:

GET: https://<nsx-manager-ip>/api/v1/transport-nodes/<node-id>/network/interfaces?source=realtime

Note: You should see local VTEP state as NORMAL.
Sample output:

{ 
interfaceId: vmk10,
linkStatus: UP,
adminStatus: UP,
mtu: 1600,
interfaceAlias: [{
broadcastAddress: 192.168.1.255,
ipAddress: {
ipv4: 2239043120
},
ipConfiguration: STATIC,
netmask: 255.255.255.0,
macAddress: 00:50:##:##:##:a6
}],
state: NORMAL
}

Maintenance window required for remediation? Yes 

Additional Information