Edge HA Failover for T0 component service interface .
search cancel

Edge HA Failover for T0 component service interface .

book

Article ID: 317770

calendar_today

Updated On:

Products

VMware NSX Networking

Issue/Introduction

Symptoms:
When the Edge HA failover is triggered as a result of reboot of the Active edge with the "Preemptive" HA setting. We see the traffic sent to the standby node after 10 minutes of resuming traffic on the rebooted edge; resulting in traffic getting blackhole'd.

Considering Edge-01 as Active and Edge-02 as Standby.

1. The Failover will happen and the Edge-02 will take over the Active role as you reboot the Edge-01. 
2. Once Edge-01 resumes service, It assumed the Active role and the Edge-02 falls back to Standby.
3. However, after 10 minutes; Edge-02 sends out a Unicast ARP to its neighbor who had Edge-01's MAC address in its ARP table. This results into the Neighbor (TOR) update its ARP table to reflect the MAC of Edge-02. Which in turn results in traffic getting blackhole'd. 



Environment

VMware NSX-T Data Center

Cause

Standby edge node sends out a Unicast ARP to its neighbors as the ARP aging timer expires, in order to renew the ARP age. This results in the neighbor device updating the Standby edge's MAC in the ARP table resulting into traffic getting blackhole'd.

Resolution

The issue is resolved in NSX-T 3.0 and NSX-T 2.5.2

Workaround:
Use HA in Non-Premptive mode