VMware NSX-T Data Center
VMware NSX
In a multi-TEP configuration, the Edge maps traffic for overlay segment to individual TEPs.
A TEP will be considered to have failed when there is a link down event on the network interface it is mapped to.
Tunnel/BFD state change to down does not trigger a TEP failover.
Consider a 2 TEP configuration
Bare Metal Edge
Edge VM
Bare Metal Edge and Edge VM
Note for both Bare Metal Edge and Edge VM there can be a corner case scenario where a TEP is considered Up because its associated uplink is up but the TEP's tunnels are down.
This condition can result in the blackholing of traffic for any segments mapped to that TEP.
NSX 4.2.1 introduced Group TEP High Availability for Edge nodes based on BFD session state. This feature handles this TEP failure scenario. The TEP Group will be marked as down and the other TEP Group will handle the traffic, see Release Notes.
It is enabled via API
GET /policy/api/v1/infra/connectivity-global-config
{
// ...
"global_replication_mode_enabled": false,
"is_inherited": false,
"site_infos": [],
"tep_group_config": {
"enable_tep_grouping_on_edge": false <-------------
},
"resource_type": "GlobalConfig",
"id": "global-config",
"display_name": "default",
"path": "/infra/global-config",
// ...
}
PUT /policy/api/v1/infra/connectivity-global-config
{
// ...
"global_replication_mode_enabled": false,
"is_inherited": false,
"site_infos": [],
"tep_group_config": {
"enable_tep_grouping_on_edge": true <-------------
},
"resource_type": "GlobalConfig",
"id": "global-config",
"display_name": "default",
"path": "/infra/global-config",
// ...
}
This API enables both TEP Grouping and High Availability for versions NSX 4.2.1 and above.
This is known behaviour of NSX and it is working as designed.
NSX has alarms that notify for tunnel/BFD down events, these should always be investigated and resolved to ensure a fully functional environment.
For further troubleshooting assistance, please visit Troubleshooting NSX Edge High Availability.
If you are contacting Broadcom support about this issue, please provide the following:
Handling Log Bundles for offline review with Broadcom support