BGP sessions with uplink routers could go down after NSXT Edge VM resize
search cancel

BGP sessions with uplink routers could go down after NSXT Edge VM resize

book

Article ID: 369842

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

BGP not peering after NSXT Edge VM resize

  • All BGP sessions with uplink routers went down after resizing NSXT edge VMs (For ex., from Large to Xtra-Large size) using redeploy API /api/v1/transport-nodes/<EVM-TN-UUID>?action=redeploy
  • Performing a trace capture the following may be observed on the Transport Node (ESX) the Edge is registered on:
    • pktcap-uw --switchport 1234567 --capture trace
    • {...}
    • 00:26:48. 70449[179] Captured at PktFree point, Drop Reason 'MAC Forgery Drop' . Drop Function 'L2Sec_FilterSrcMACForgeries'. TSO not enabled, Checksum not offloaded and not verified, SourcePort #######, VLAN tag ###, length 60

      PATH
      +- [00:26:48.70424] |    VnicIx      |    ####### |
      +- [00:26:48.70427] |    PortInput   |    ####### |
      +- [00:26:48.70428] |    IOChain     |            |    [email protected]#1.0.7.0.20737187
      +- [00:26:48.70428] |    IOChain     |            |    [email protected]#1.0.7.0.20737187
      +- [00:26:48.70432] |    IOChain     |            |    [email protected]#1.1.7.0.20737187
      +- [00:26:48.70434] |    IOChain     |            |    [email protected]#1.1.7.0.20737187
      +- [00:26:48.70436] |    IOChain     |            |    [email protected]#1.0.7.0.20737187
      +- [00:26:48.70445] |    Drop        |            |
      +- [00:26:48.70447] |    PktFree     |            |

      Segment [0] ---- 60 bytes:
      0x0000: ffff ffff ffff 0050 1234 5678 0806 0001
      0x0010: 0800 0604 0001 0050 1234 5678 0a00 7d72
      0x0020: ffff ffff ffff 0a00 1234 5678 0000 0000
      0x0030: 0000 0000 0000 0000 0000 0000

    • The trace shows the ARP request is dropped in the vDS due to a policy violation with "MAC Forgery Drop"  (0x0806 is ARP ethtype)
  • The MAC address of the T0 interface for the Lif may not be the same as the eth1.

Environment

VMware NSX-T Data Center
VMware NSX

Cause

  • After Edge VM redeployment and exit out of Maintenance mode, the workflow to refresh uplink MAC address of Edge VM which updates the newly assigned MAC address (after re-deployment) is not triggered.
  • This issue affects ARP resolution of Edge VM MAC address and hence BGP session with upstream routers are failed to establish.

Resolution

  • This issue is resolved in VMware NSX 3.2.3, 4.1.2 and above available at Broadcom downloads.

  • If you are having difficulty finding and downloading software, please review the Download Broadcom products and software.

    Workaround:

    1. Reboot the edge nodes and see if that brings any improvement

If the issue persists try the below steps:

    1. To create a dummy update on the redeployed Edge transport node.  For example, from an NSX manager simply change edge node display name and then back.
    2. This is to trigger an update on the Edge node which will invoke a function internally that updates latest MAC address on all ports on the related Edge VM.