NSX T0 High Availability failover causes a prolonged data plane outage on traffic traversing the Edges
search cancel

NSX T0 High Availability failover causes a prolonged data plane outage on traffic traversing the Edges

book

Article ID: 377932

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

  • Environment was recently migrated from NSX-V to NSX-T using the Migration Coordinator.
  • Full Migration with Bring Your Own Topoloy (BYOT) option was attempted.
  • Migration Coordinator was not completed - it was stopped at L3-L7 Check-realization step.
  • Host and Edge migration was not completed via the Migration Coordinator.
  • ESXi hosts were manually configured for NSX-T.
  • Edge logs show routing domain is down.

/var/log/nsx-event.log
[Timestamp] [Edge] NSX 1 FABRIC [nsx@6876 comp="nsx-edge" subcomp="nsxa" s2comp="ha-cluster" level="INFO" eventId="vmwNSXClusterNodeStatus"] {"event_state":2,"event_external_reason":"Edge node status changed: Up (Routing Down)

  • Routing domain on Edges show VXLAN is still in use.

nsx-edge> get routing-domain
Routing Domain
UUID        : [UUID]
Replication Tunnels
    Tunnel      : [UUID]
    IFUID       : [IFUID]
    Local       : [Local IP]
    Remote      : [Remote IP]
    ENCAP       : VXLAN
    MTEP        : False

  • Outage lasts up to 10 minutes - traffic automatically recovers.

Environment

VMware NSX
VMware NSX-T Data Center

Cause

This workflow is not supported with Full Migration BYOT. For Full Migration BYOT the migration of Hosts and Edges should be completed via the Migration Coordinator. This unsupported workflow results in the Edges retaining the VXLAN encapsulation.

Resolution

Unsupported workflow.

Workaround:

  1. In NSX-T UI "System->Fabric" Tab, check each host TN to see if any TN uses an uplink profile whose name has "VXLAN_" at the end. For each TN found, change the TN to use the uplink profile whose name has the same prefix but not "VXLAN_" at the end.
  2. In NSX-T UI "Networking->Segments" Tab, check each overlay segment to see if any use Head End Replication. For each segment found, change it to Hierarchical Two-Tier replication.