Title: Alarm for Autorecover failure of VTEP.
Event ID: tep_health.tep_autorecover_failure
Added in release: 4.1.0/ M22
Alarm Description
- Purpose: This alarm indicates an autorecover was attempted on faulty vtep and it has failed since all BFD sessions from that local vtep are still down.
- Impact: Overlay VMs using this local vtep would face network outage.
Resolution:1. Check underlay configuration for packet forwarding issue at TOR and all of the next hop involved in routing in underlay.
2. Check for pnic firmware issues and upgrade firmware to latest version.
After fixing the underlay issue wait for next autorecovery attempt or invoke manual recovery through api: POST https://'nsx-mgr'/policy/api/v1/infra/sites/'site-id'/enforcement-points/'enforcementpoint-id'/host-transport-nodes/'host-transport-node-id'/vteps/actions
{
type: TransportNodeVTEPRecoveryRequest,
device_name: vmk10
}
and then check for local vtep state through api: GET: https://'nsx-manager'/api/v1/transport-nodes/'node-id'/network/interfaces?source=realtime. It should show local vtep state as NORMAL.
sample output:
{
interfaceId: vmk10,
linkStatus: UP,
adminStatus: UP,
mtu: 1600,
interfaceAlias: [{
broadcastAddress: 133.117.22.255,
ipAddress: {
ipv4: 2239043120
},
ipConfiguration: STATIC,
netmask: 255.255.255.0,
macAddress: 00:50:56:66:67:a6
}],
state: NORMAL
}