Split-Brain Condition on Bare Metal NSX-T Edges due to BFD Timeout
search cancel

Split-Brain Condition on Bare Metal NSX-T Edges due to BFD Timeout

book

Article ID: 422317

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

  • Hardware error reported on the standby Bare Metal edge node.
  • After the hardware error the edge cluster got into a state where both nodes continually tried to be the active node. 

[nsx@6876 comp="nsx-edge" subcomp="nsxa" s2comp="bridge-fsm" level="INFO"] ####-####-####-#### event RemoteStateUpdate [Active,Unknown] Active,2372796417 event Remote State Updated reason 'Remote state changed to Active'

  • The NSX-T Bare Metal Edge, configured with bridging, encountered a split-brain scenario.

[nsx@6876 comp="nsx-edge" subcomp="nsxa" s2comp="bridge-fsm" level="INFO"] ####-####-####-#### split-brain heal pending

  • This resulted in MAC address conflicts across multiple port channels, causing storm control to activate on the switches.

Environment

VMware NSX

Cause

The issue was triggered by a hardware failure in this case. This failure broke the communication path used by BFD. When the standby node stopped receiving BFD keep-alives, it transitioned to Active to maintain service availability, unaware that the original active node was still functional.

Resolution

Fix underlying hardware failure.

Temporary workaround:

  • Isolate the Faulty Node - Power off or disconnect the node from network