VTEP Autorecover failure Alarm
search cancel

VTEP Autorecover failure Alarm

book

Article ID: 322536

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

Title: Alarm for Autorecover failure of VTEP.
Event ID: tep_health.tep_autorecover_failure

Alarm Description

  • Purpose: This alarm indicates an autorecover was attempted on faulty vtep and it has failed since all BFD sessions from that local vtep are still down.
  • Impact: Overlay VMs using this local vtep would face network outage.

 

Environment

VMware NSX-T Data Center

Resolution

Steps to resolve
For 4.1.0 and higher

Recommended Action:

  1. Check underlay configuration for packet forwarding issue at TOR and all of the next hop involved in routing in underlay.
  2. Check for pnic firmware issues and upgrade firmware to latest version.

After fixing the underlay issue wait for next autorecovery attempt or invoke manual recovery through api: POST https://'nsx-mgr'/policy/api/v1/infra/sites/'site-id'/enforcement-points/'enforcementpoint-id'/host-transport-nodes/'host-transport-node-id'/vteps/actions

{
type: TransportNodeVTEPRecoveryRequest,
device_name: vmk10
}



and then check for local vtep state through api: GET: https://'nsx-manager'/api/v1/transport-nodes/'node-id'/network/interfaces?source=realtime. It should show local vtep state as NORMAL.

sample output:

{
interfaceId: vmk10,
linkStatus: UP,
adminStatus: UP,
mtu: 1600,
interfaceAlias: [{
  broadcastAddress: 192.168.22.255,
  ipAddress: {
    ipv4: 2239043120
  },
  ipConfiguration: STATIC,
  netmask: 255.255.255.0,
  macAddress: ##:##:##:##:##:a6
}],
state: NORMAL
}

Is there a way to Work Around: No

Maintenance window required for remediation? Yes
 

API reference

Additional Information