Troubleshooting NSX OSPF
search cancel

Troubleshooting NSX OSPF

book

Article ID: 375376

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

When troubleshooting NSX OSPF failures there are a few fundamental guidelines to follow in getting to a resolution. This article outline the different steps to follow in analyzing and resolving  OSPF issues. 

Environment

VMware NSX

Cause

Common OSPF failure reasons seen in NSX are :

  • Peer connectivity issue
  • Hello timers mismatch
  • Mismatch areas
  • Mismatch passwords

Resolution

  • Verify basic connectivity
    • Verify the T0 interface is connected to the correct segment/VLAN and OSPF is enabled at interface level.
    • Check there is connectivity between the Edge T0 and the TOR.
  • Verify OSPF Configuration/settings
    • From NSX UI go to the T0 GW and verify the OSPF config. Click on edit if needed.
    • Verify OSPF route redistribution is enabled to OSPF.
    • Network Types: (should match)
      • Broadcast: A router will only form an OSPF adjacency on a broadcast network with a DR/BDR
      • Point to Point: No DR/BDR Election. Simpler output and less adjacencies to track
    •   Topology type:
    • Active/Standby: Metric 65534 (hard coded)
    • Active/Active: OSPF ECMP up to 8 links. 2 interfaces can be enabled per edge node. Metric 20.
    • Commands:
      •  get ospf database external x.x.x.x 
    • Parameters:
      • Parameters in the Hello packets (Dst IP: 224.0.0.5 – All SPF Routers) must match between OSPF:
        • Area 
        • Subnet Mask configured on that interface
        • MTU (MTU-Ignore is not supported by NSX-T)
        • OSPF Timers (Hello / Dead)
        • Interface Type (P2P / Broadcast ...)
        • Router Priority
        • Unique Router-ID (If no loopback, the highest interface IP add is used)
        • Authentication (if setup)
  • Verify OSPF Neighbor states:
    • From NSX UI go to the T0 GW and click on OSPF neighbors > View to check the states:
        • Down – No hello packets
        • Attempt – Used for manually configured neighbors
        • Init – Hello packets have been received from neighbor
        • 2 way - Identifies compatible neighbors
        • Exstart/Exchange – Master/Slave relationship.
        • Loading – Link State information is exchanged. LSR/LSA/LSU
    • Full - Full state indicates everything is functioning normally
      • Note that the 2-way state is normal in a broadcast network. In this case, routers only achieve the FULL state with their DR and BDR.
  • Commands and logs:
    • get logical-router 
    • vrf <t0_sr_vrf_id>
    • get route || ospf 
    • get ospf | interface | neighbor | database | route
    • get bfd-config | bfd-sessions 
    • Logs:
      • <Edge bundle>/var/log/frr/: ospf_support_bundle.log and frr.log
      • <Edge bundle>/var/log/syslog
      • <Edge bundle>/edge/tier0_sr_get_ospf | route | neighbor | interface | database
    • Edge CLI Debugs
      • Set debug logs on OSPF              
      • From inside the T0 VRF
        • set debug
        • set routing debug ospf
        • get routing debug ospf
      • To disable debug logs:
        • press ctrl C
        • clear routing debug ospf
        • clear debug

 

Additional Information

 

If you are contacting Broadcom support about this issue, please provide the following:

  • State of the OSPF connection reported on peer device
  • Are you able to ping the peer device from the T0 SR
  • How long as the session reported down/Has this ever worked?
  • OSPF configuration on peer device
  • State of the physical network.

Handling Log Bundles for offline review with Broadcom support