Tier 0 gateway failover from active to down alert on NSX Bare metal edge nodes
search cancel

Tier 0 gateway failover from active to down alert on NSX Bare metal edge nodes

book

Article ID: 389011

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

The alert of tier0 gateway <GW-UUID> failover from Active to Down, service-router <T0-SR-UUID> gets triggered periodically on NSX Bare metal edge nodes 

Environment

3.2.3.52

Cause

During the issue timestamp check if any physical link down event as shown below is seen in syslogs of Bare metal edge nodes.

In the below sample, it is observed that interfaces eth5, ethx-mlx link went down and it kept fluctuating intermittently 

2025-01-30T00:17:40.473Z baremetalEdge2 kernel - - - [11011423.435458] mlx5_core 0000:b2:00.1 eth5: Link down
2025-01-30T00:17:40.424Z baremetalEdge2 kernel - - - [11011423.456146] mlx5_core 0000:61:00.1 eth3-mlx: Link down
2025-01-30T00:17:40.424Z baremetalEdge2 kernel - - - [11011423.462471] mlx5_core 0000:11:00.1 eth1-mlx: Link down
2025-01-30T00:17:40.451Z baremetalEdge2 kernel - - - [11011423.472241] bond0: (slave eth5): link status definitely down, disabling slave

Also check if there are BFD down events observed in the syslogs if BGP/BFD is configured in the uplinks

2025-01-30T00:18:28.213Z baremetalEdge2 NSX 3303 - [nsx@6876 comp="nsx-edge" s2comp="nsx-monitoring" entId="XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXX" tid="XXXX" level="ERROR" eventState="On" eventFeatureName="routing" eventSev="error" eventType="bfd_down_on_external_interface"] In router <router UUID>, BFD session for peer <peer IP> is down.

Resolution

Check the physical link status of uplinks connected between bare metal edge nodes and uplink routers are proper and no issues observed in the connectivity between them.