<Timestamp> <Hostname> NSX 18 FABRIC [nsx@6876 comp="nsx-edge" subcomp="nsxa" s2comp="dp" level="INFO"] Process DP BFD state update<Timestamp> <Hostname> NSX 18 FABRIC [nsx@6876 comp="nsx-edge" subcomp="nsxa" s2comp="ha-cluster" level="INFO"] HA tunnel <Local IP>:<Peer IP> state changed from Concat Path Down to Unreachable<Timestamp> <Hostname> NSX 18 FABRIC [nsx@6876 comp="nsx-edge" subcomp="nsxa" s2comp="dp" level="INFO"] Process DP BFD state update done<Timestamp> <Hostname> NSX 3557 FABRIC [nsx@6876 comp="nsx-edge" subcomp="datapathd" s2comp="appha-peer-pkt" tname="dp-bfd-mon4" level="INFO"] Last BFD down in HA transport <Peer node UUID><Timestamp> <Hostname> NSX 3557 FABRIC [nsx@6876 comp="nsx-edge" subcomp="datapathd" s2comp="appha-peer-pkt" tname="dp-bfd-mon4" level="INFO"] app-channel over HA transport <Peer node UUID>: state 2->0<Timestamp> <Hostname> NSX 18 FABRIC [nsx@6876 comp="nsx-edge" subcomp="nsxa" s2comp="dp" level="INFO"] Process DP BFD state update<Timestamp> <Hostname> NSX 18 FABRIC [nsx@6876 comp="nsx-edge" subcomp="nsxa" s2comp="tunnel" level="INFO"] Tunnel <Local IP>:<Peer IP>(geneve) state updated from up to down<Timestamp> <Hostname> NSX 18 FABRIC [nsx@6876 comp="nsx-edge" subcomp="nsxa" s2comp="ha-fsm" level="INFO"] HA state Active, processing event BFD State Updated reason Updated<Timestamp> <Hostname> NSX 18 FABRIC [nsx@6876 comp="nsx-edge" subcomp="nsxa" s2comp="ha-cluster" level="INFO"] HA tunnel <Local IP>:<Peer IP> state changed from Concat Path Down to Unreachable<Timestamp> <Hostname> NSX 18 FABRIC [nsx@6876 comp="nsx-edge" subcomp="nsxa" s2comp="ha-cluster" level="WARN"] Node <Peer node UUID> status changed from Up (Routing Down) to Unreachable
<Timestamp> <Hostname> NSX 18 FABRIC [nsx@6876 comp="nsx-edge" subcomp="nsxa" s2comp="app-ha-bridge" level="INFO"] bridge <BridgeEndpoint UUID> attached to VNI lswitch <Logical Switch UUID> state changed from Standby to Active
VMware NSX-T
VMware NSX
L2 bridge split brain causes L2 loop.
When L2 bridge becomes active on 2 Edge nodes, VLAN and overlay networks are bridged on 2 nodes and makes L2 loop.
L2 bridge split brain is usually seen in case of infrastructure issues.
For example;
In such situations, Edge nodes can detect the peer is down due to lack of BFD and L2 bridge can become active from standby.
However, the peer is not actually down and eventually comes back.
As soon as Edge nodes detect the peer is active, L2 bridge becomes standby on one of the Edge nodes,
but for a short period of time, L2 bridge is active on both nodes during which L2 loop is formed.
To prevent the issue, make sure the infrastructure is resilient.
For example,
If you plan a maintenance and expect such a situation, it can avoid split brain to make one of the Edge nodes maintenance mode.
It also helps to mitigate the risk of split brain to tune NSX Edge cluster profile with more BFD Probe Interval and BFD Declare Dead Multiple.
However, it makes longer downtime in case of real Edge node failures.
Add an NSX Edge Cluster Profile
https://techdocs.broadcom.com/us/en/vmware-cis/nsx/vmware-nsx/4-2/installation-guide/transport-zones-and-transport-nodes/configuring-profiles/add-an-edge-cluster-profile.html