L2 bridge causes L2 loop in case of split brain
search cancel

L2 bridge causes L2 loop in case of split brain

book

Article ID: 383264

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

  • You configure L2 bridge on 2 Edge nodes with Active/Standby.
  • You see symptoms that are caused by L2 loop.
    The symptoms include
    • Broadcast storm
    • errdisable by BPDU
    • MAC addresses are learned on unexpected ports.
    • etc
  • Despite both Edge nodes are running, in syslog of one of the Edge nodes, it is logged that the HA peer node becomes unreachable and L2 Bridge becomes from standby to active.

    <Timestamp> <Hostname> NSX 18 FABRIC [nsx@6876 comp="nsx-edge" subcomp="nsxa" s2comp="dp" level="INFO"] Process DP BFD state update
    <Timestamp> <Hostname> NSX 18 FABRIC [nsx@6876 comp="nsx-edge" subcomp="nsxa" s2comp="ha-cluster" level="INFO"] HA tunnel <Local IP>:<Peer IP> state changed from Concat Path Down to Unreachable
    <Timestamp> <Hostname> NSX 18 FABRIC [nsx@6876 comp="nsx-edge" subcomp="nsxa" s2comp="dp" level="INFO"] Process DP BFD state update done
    <Timestamp> <Hostname> NSX 3557 FABRIC [nsx@6876 comp="nsx-edge" subcomp="datapathd" s2comp="appha-peer-pkt" tname="dp-bfd-mon4" level="INFO"] Last BFD down in HA transport <Peer node UUID>
    <Timestamp> <Hostname> NSX 3557 FABRIC [nsx@6876 comp="nsx-edge" subcomp="datapathd" s2comp="appha-peer-pkt" tname="dp-bfd-mon4" level="INFO"] app-channel over HA transport <Peer node UUID>: state 2->0
    <Timestamp> <Hostname> NSX 18 FABRIC [nsx@6876 comp="nsx-edge" subcomp="nsxa" s2comp="dp" level="INFO"] Process DP BFD state update
    <Timestamp> <Hostname> NSX 18 FABRIC [nsx@6876 comp="nsx-edge" subcomp="nsxa" s2comp="tunnel" level="INFO"] Tunnel <Local IP>:<Peer IP>(geneve) state updated from up to down
    <Timestamp> <Hostname> NSX 18 FABRIC [nsx@6876 comp="nsx-edge" subcomp="nsxa" s2comp="ha-fsm" level="INFO"] HA state Active, processing event BFD State Updated reason Updated
    <Timestamp> <Hostname> NSX 18 FABRIC [nsx@6876 comp="nsx-edge" subcomp="nsxa" s2comp="ha-cluster" level="INFO"] HA tunnel <Local IP>:<Peer IP> state changed from Concat Path Down to Unreachable
    <Timestamp> <Hostname> NSX 18 FABRIC [nsx@6876 comp="nsx-edge" subcomp="nsxa" s2comp="ha-cluster" level="WARN"] Node <Peer node UUID> status changed from Up (Routing Down) to Unreachable

    <Timestamp> <Hostname> NSX 18 FABRIC [nsx@6876 comp="nsx-edge" subcomp="nsxa" s2comp="app-ha-bridge" level="INFO"] bridge <BridgeEndpoint UUID> attached to VNI lswitch <Logical Switch UUID> state changed from Standby to Active

  • Finally L2 bridge becomes standby on either nodes, but for a short period of time, L2 bridge is active on both nodes.

Environment

VMware NSX-T

VMware NSX

Cause

L2 bridge split brain causes L2 loop.
When L2 bridge becomes active on 2 Edge nodes, VLAN and overlay networks are bridged on 2 nodes and makes L2 loop.

L2 bridge split brain is usually seen in case of infrastructure issues.
For example;

  • Temporary network disruption.
  • Storage goes unresponsive or very slow.
  • CPU and/or memory contention on ESXi.

In such situations, Edge nodes can detect the peer is down due to lack of BFD and L2 bridge can become active from standby.
However, the peer is not actually down and eventually comes back.
As soon as Edge nodes detect the peer is active, L2 bridge becomes standby on one of the Edge nodes,
but for a short period of time, L2 bridge is active on both nodes during which L2 loop is formed.

Resolution

To prevent the issue, make sure the infrastructure is resilient.
For example,

  • Make sure network and storage paths are redundant.
  • Be careful not to disrupt all the network and storage paths when you perform maintenance on your infrastructure.
  • CPU and/or memory contention can be mitigated by resource reservation.

If you plan a maintenance and expect such a situation, it can avoid split brain to make one of the Edge nodes maintenance mode.

It also helps to mitigate the risk of split brain to tune NSX Edge cluster profile with more BFD Probe Interval and BFD Declare Dead Multiple.
However, it makes longer downtime in case of real Edge node failures.

Additional Information

Add an NSX Edge Cluster Profile
https://techdocs.broadcom.com/us/en/vmware-cis/nsx/vmware-nsx/4-2/installation-guide/transport-zones-and-transport-nodes/configuring-profiles/add-an-edge-cluster-profile.html