Troubleshooting NSX L2 Bridging
search cancel

Troubleshooting NSX L2 Bridging

book

Article ID: 378697

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

When troubleshooting NSX Layer 2 (L2) Bridging, a specific set of data must be gathered at the time of the event. This article details what documentation is required and how to gather it prior to opening a support request with Broadcom.

Environment

VMware NSX

Cause

When it comes to troubleshooting NSX Layer 2 (L2) Bridging, there are several layers of troubleshooting involved. The purpose of this troubleshooting article is to list them so as to aid such troubleshooting.

Resolution

Here are some troubleshooting steps that can be performed:

  1. Collect initial information about the problem with the L2 Bridge:
    1. Is this an initial installation assistance request? If so, please refer the customer to VMware Professional Services.
    2. What is the purpose of the bridge? Is this a temporary solution for workload migration, or a permanent solution to connect physical and virtual?
    3. What networks are being connected across the bridge? Determine what VLAN ID or VLAN ID range to VNI mappings are done by the bridge. 
    4. What NSX-T Edges are used in the bridge and on what ESXi hosts do they reside? Are the hosts prepared for NSX-T/NSX-V, managed by vSphere? What bridge profile and edge cluster is used?
    5. Which of the three options available was used to build the bridge? i.e. MAC Learning, Sink Port, Promiscuous mode. Refer to Configure an Edge VM for Bridging for details on each option.

  2. Determine the symptoms of the problem:
    1. Connectivity issues: 
      1. When did the problem start (date/time, include time zone) and what recent changes have been made?
      2. What command, such as ping, can we use to validate the connectivity? Make sure no gateway firewall or DFW is blocking this traffic. 
      3. Examine the MAC address table of the Logical Switches in question and see if the entries seem stale. Stale entries can be flushed by disconnecting and re-connecting the logical switch to it's T1.
    2. Performance issues across the bridge:
      1. What is the expected performance across the bridge? 
      2. If promiscuous mode is enabled, there might be potential performance issues because the traffic is replicated to all VMs attached to the port group in the host. 

  3. In the UI, verify the configuration of the edge
    1. Check the bridge profile: System > Fabric > Profiles. Determine what edge cluster is being used and what edge is active. Also take note of the failover type.
    2. Check the Segment that will use the bridge configuration. Make sure that the bridge profile, bridge VLAN Transport Zone and the VLAN ID or range of VLAN IDs is configured properly.
  4. In vCenter UI, check the dvPortGroup connected to the uplink vNIC of the edge. 
    1. Make sure that forged transmits and MAC Learning is allowed.
    2. The dvPortGroup shows as VLAN Trunk.
    3. If using promiscuous mode, make sure it is enabled. 
    4. Do not have other port groups in promiscuous mode on the same host sharing the same set of VLANs.

  5. In the Edge CLI as admin user:
    1. Run "get bridge summary" to get the UUID of the bridge, HA status on the local edge, VLAN and Segment being bridged, make sure that the active edge is forwarding. 
    2. Use "get bridge <UUID>" "get bridge name <segment name>" "get bridge vlan <VLAN ID>" to display the information of a particular bridge. The three above commands would produce the same output, detailing all the information related to the bridge extending segment to VLAN. 
    3. The edge bridge syncs mac addresses it has learnt to its peer. This will be use for sending RARP packet to update the mac-address tables during switchover. You can get information about the mac-addresses synced using the following “get bridge mac-sync-table” command.

  6. In the ESXi host where the edges reside, if using promiscuous mode:
    1. Run "esxcli system settings advanced list -o /Net/ReversePathFwdCheckPromisc", confirm that "Int Value" is set to 1. 
    2. Otherwise, use "esxcli system settings advanced set -o /Net/ReversePathFwdCheckPromisc -i 1", then disable and enable promiscuous mode on the DVPG to which the bridge vNIC is connected as described in Option 1 of the Administration Guide Configure an Edge VM for Bridging.
    3. On the ESXi host, use "vsish -e cat /net/portsets/{edge_vm_portset}/ports/{edge_vm_port}/outputstats" to make sure the Reverse Path filter is applied.
  7. Collect simultaneous packet captures on the source, destination, L2 Bridge Edge and ESXi hosts that contain them.  
    1. On the edge: 
      1. "start capture interface <T1 SR uplink/backplane/downlink interface UUID> <expression>"
    2. ESXi hosts: 
      1. "pktcap-uw --switchport <port#> --capture VnicTx,VnicRx --snaplen 150 --ng -o /<directory>/<name>.pcap"
      2. "pktcap-uw --uplink <vmnic#> --capture UplinkSndKernel,UplinkRcvKernel --snaplen 150 --ng -o /<directory>/<name>.pcap2"
  8. If multiple bridge profiles with different edges, bridging same VLAN are configured on the same segment, there can be packet loss observed.
    Ensure that the configured bridge profiles are bridging different VLANs if connected to same segment.

Additional Information

Reference:

NSX-T Edge Bridge: Almost everything you wanted to know

Edge Bridging: Extending Overlay Segments to VLAN

 

If you are contacting Broadcom support about this issue, please provide the following:

  • NSX Edge log bundles for all Edges in the Edge Cluster containing the L2 Bridge
  • Ensure log date range covers the full date of the event(s) being investigated. When in doubt, retrieve logs for all time.
  • NSX Manager log bundles
  • ESXi host log bundles for all hosts where the affected Edge VMs are running
  • Text of any error messages seen in NSX GUI or command lines pertinent to the investigation


Known issues: