The alert of tier0 gateway <GW-UUID> failover from Active to Down, service-router <T0-SR-UUID> gets triggered periodically on NSX Bare metal edge nodes
3.2.3.52
During the issue timestamp check if any physical link down event as shown below is seen in syslogs of Bare metal edge nodes.
In the below sample, it is observed that interfaces eth5, ethx-mlx link went down and it kept fluctuating intermittently
2025-01-30T00:17:40.473Z baremetalEdge2 kernel - - - [11011423.435458] mlx5_core 0000:b2:00.1 eth5: Link down
2025-01-30T00:17:40.424Z baremetalEdge2 kernel - - - [11011423.456146] mlx5_core 0000:61:00.1 eth3-mlx: Link down
2025-01-30T00:17:40.424Z baremetalEdge2 kernel - - - [11011423.462471] mlx5_core 0000:11:00.1 eth1-mlx: Link down
2025-01-30T00:17:40.451Z baremetalEdge2 kernel - - - [11011423.472241] bond0: (slave eth5): link status definitely down, disabling slave
Also check if there are BFD down events observed in the syslogs if BGP/BFD is configured in the uplinks
2025-01-30T00:18:28.213Z baremetalEdge2 NSX 3303 - [nsx@6876 comp="nsx-edge" s2comp="nsx-monitoring" entId="XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXX" tid="XXXX" level="ERROR" eventState="On" eventFeatureName="routing" eventSev="error" eventType="bfd_down_on_external_interface"] In router <router UUID>, BFD session for peer <peer IP> is down.
Check the physical link status of uplinks connected between bare metal edge nodes and uplink routers are proper and no issues observed in the connectivity between them.