BFD session on the external interface is down alarm
search cancel

BFD session on the external interface is down alarm

book

Article ID: 369183

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

Title: Alarm to indicate that the status of the BFD session on the external interface is down.
Event ID: bfd_down_on_external_interface

Alarm description:

  • Purpose: To notify the admin that the BFD session is down for the configured BFD session on the edge.
  • Impact: BFD session down between the edge node interface and the external peer could cause potential traffic disruptions within the network.

Environment

VMware NSX-T Data Center
VMware NSX

Resolution

Steps to resolve:
For 3.0.0 and higher

Recommended Action:

  • Check for configuration-related issues:
    • Verify if the source and destination addresses on the edge node are configured correctly.
    • Follow the steps below to verify the configuration on the edge node.
    • Invoke the NSX CLI command get logical-routers.
    • Sample CLI output: 

      Edge1> get logical-routers
      Logical Router
      UUID                                 VRF    LR-ID  Name                              Type                        Ports   Neighbors
      ########-####-####-####-###########   0      0                                        TUNNEL                      4       10/5000
      ########-####-####-####-###########   1      3      SR-tier0                          SERVICE_ROUTER_TIER0        6       0/50000
      ########-####-####-####-###########   3      1      DR-tier0                          DISTRIBUTED_ROUTER_TIER0    6       2/50000

  • Switch to the service router {sr_id} using the NSX CLI command vrf vrf_id_of_service_router.
  • Ensure the BFD is enabled on the BGP neighbor and check the BFD configuration for configured keepalive interval and multiplier.
    • Use the API for the BGP neighbor where BFD is enabled GET /policy/api/v1/global-infra/tier-0s/{tier-0-id}/locale-services/{locale-service-id}/bgp/neighbors.
    • Sample Output:
      GET /policy/api/v1/global-infra/tier-0s/{tier-0-id}/locale-services/{locale-service-id}/bgp/neighbors
      {
          "source_addresses": [
              "##.###.##.##.###",
              "##.###.##.##.###"
          ],
          "neighbor_address": "##.###.##.##.###",
          "remote_as_num": "420",
          "route_filtering": [
              {
                  "enabled": true,
                  "address_family": "IPV4"
              }
          ],
          "keep_alive_time": 1,
          "hold_down_time": 3,
          "bfd": {
              "enabled": true, ---------> Admin State
              "interval": 500, ---------> Keepalive Interval
              "multiple": 3.   ---------> Multiplier
          },
          "allow_as_in": false,
          "maximum_hop_limit": 1,
          "password_set": false,
          "enabled": true,
          "resource_type": "BgpNeighborConfig",
          "id": "##.###.##.##.###",
          "display_name": "##.###.##.##.###",
          "path": "/infra/tier-0s/tier0VrfA/locale-services/VRFA_tier0localeservices/bgp/neighbors/##.###.##.##.###",
          "relative_path": "##.###.##.##.###",
          "parent_path": "/infra/tier-0s/tier0VrfA/locale-services/VRFA_tier0localeservices/bgp",
          "unique_id": "########-####-####-####-###########",
          "realization_id": "########-####-####-####-###########",
          "owner_id": "########-####-####-####-###########",
          "marked_for_delete": false,
          "overridden": false,
          "_system_owned": false,
          "_create_time": 1712868927242,
          "_create_user": "admin",
          "_last_modified_time": 1712943952559,
          "_last_modified_user": "admin",
          "_protection": "NOT_PROTECTED",
          "_revision": 1
      }
  • Invoke the NSX CLI command get bfd-sessions and verify  the local_address, remote_address and destination_port.
  • Check if the local_discriptor value is set for the transmitted packet and the remote_discriptor value is updated in the output.
    • If the remote_discriptor value is not populated, the issue could be due to BFD packets being unable to reach the edge. Follow the Check for connectivity-related issues section below.
    • If the remote_discriptor value is present and the session remains in the Down state check the diag code.
Diag Code Description Action
Control Detection Time Expired The BFD rx_interval timer expired, and the end reporting the expiration declares the session down.
  • Check if the BFD timer is too aggressive for system load and path traffic load. Aggressive timers may cause BFD flaps
  • The default BFD timer is 500 sec, and the detection time multiplier is 3.
Neighbor Signaled Session Down Peer voluntarily brings down session when local BFD is up. Check peer BFD configuration.
Administratively Down BFD session on the edge is not enabled Enable BFD session on the edge.
    • Sample CLI output: 
      Edge1(tier0_vrf_sr[7])> get bfd-sessions
      BFD Session
      Dest_port                     : 3784 -----------------------------------> Destination Port
      Diag                          : No Diagnostic
      Encap                         : vlan
      Forwarding                    : last true (current true)
      Interface                     : ########-####-####-####-###########
      Intf_type                     : LR_PORT
      Keep-down                     : false
      Last_admin_down_diag_time     : 2024-04-17 13:15:18
      Last_cp_diag                  : No Diagnostic
      Last_cp_rmt_diag              : No Diagnostic
      Last_cp_rmt_state             : up
      Last_cp_state                 : up
      Last_down_time                : 2024-04-17 13:15:18
      Last_fwd_state                : UP
      Last_local_down_diag          : Neighbor Signaled Session Down ---------> Edge Diag Code
      Last_remote_admin_down_time   : 2024-04-17 13:15:18
      Last_remote_down_diag         : Administratively Down
      Last_up_time                  : 2024-04-17 13:15:19
      Local_address                 : ##.###.##.##.### -----------------------------> Local Address
      Local_discr                   : 673456400 ------------------------------> Local Discriptor
      Min_rx_ttl                    : 255
      Multiplier                    : 3
      Received_remote_diag          : No Diagnostic
      Received_remote_state         : up
      Remote_address                : ##.###.##.##.### ----------------------------> Remote Address
      Remote_admin_down             : false
      Remote_diag                   : No Diagnostic
      Remote_discr                  : 4097 -----------------------------------> Remote Discriptor
      Remote_min_rx_interval        : 1000
      Remote_min_tx_interval        : 1000
      Remote_multiplier             : 3
      Remote_state                  : up
      Router                        : ########-####-####-####-###########
      Router_down                   : false
      Rx_cfg_min                    : 500
      Rx_interval                   : 1000
      Service-link                  : false
      Session_type                  : UPLINK
      State                         : up -------------------------------------> State
      Tx_cfg_min                    : 500 ------------------------------------> Configured Transmit Min Interval
      Tx_interval                   : 1000 -----------------------------------> Transmit Interval
      Type                          : IPv4

  • Check for connectivity-related issues:
    • Invoke the NSX CLI command get logical-routers.
      Edge1> get logical-routers
      Logical Router
      UUID                                   VRF    LR-ID  Name                              Type                        Ports   Neighbors
      ########-####-####-####-###########   0      0                                        TUNNEL                      4       10/5000
      ########-####-####-####-###########   1      3      SR-tier0                          SERVICE_ROUTER_TIER0        6       0/50000
      ########-####-####-####-###########   3      1      DR-tier0                          DISTRIBUTED_ROUTER_TIER0    6       2/50000
    • Switch to the service router {sr_id} using the NSX CLI command vrf vrf_id_of_service_router
    • Invoke the NSX CLI command get route and ensure a valid route exists in the routing table for the peer.
    • Invoke the NSX CLI command ping {peer_address}.
    • If the ping fails
      • Check the VLAN on the segment/Edge logical uplink and the VLAN on the external peer interface. If the VLAN configuration does not match, ping is expected to fail.
      • Identify the correct VLAN to be configured and ensure it is configured on the edge segment/logical uplink and the interface on the external peer connecting to the edge.
      • To check the VLAN configured on the uplink interface of the edge, use the API GET /policy/api/v1/infra/segments/{segment-id}.

Sample output: GET /policy/api/v1/infra/segments/{segment-id}
{
    "type": "DISCONNECTED",
    "vlan_ids": [
        "5"
    ],
    "transport_zone_path": "/infra/sites/default/enforcement-points/default/transport-zones/8fc4a476-c2cc-4d8c-866d-eff780627ea9",
    "advanced_config": {
        "hybrid": false,
        "multicast": true,
        "inter_router": false,
        "local_egress": false,
        "urpf_mode": "STRICT",
        "connectivity": "ON"
    },
    "admin_state": "UP",
    "replication_mode": "MTEP",
    "resource_type": "Segment",
    "id": "<tier0-vrfA-uplink>",
    "display_name": "<tier0-vrfA-uplink>",
    "path": "/infra/segments/<tier0-vrfA-uplink>",
    "relative_path": "<tier0-vrfA-uplink>",
    "parent_path": "/infra",
    "unique_id": "########-####-####-####-###########",
    "realization_id": "########-####-####-####-###########",
    "owner_id": "########-####-####-####-###########",
    "marked_for_delete": false,
    "overridden": false,
    "_system_owned": false,
    "_create_time": 1712865979425,
    "_create_user": "admin",
    "_last_modified_time": 1712865979425,
    "_last_modified_user": "admin",
    "_protection": "NOT_PROTECTED",
    "_revision": 0
}

    • If the ping is successful and the BFD session remains in the DOWN state:
      • Check for any firewall rules configured to block the BFD control packets.
      • Check the BFD config on the peer.