Troubleshooting NSX L2 Bridging
search cancel

Troubleshooting NSX L2 Bridging

book

Article ID: 378697

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

When troubleshooting NSX Layer 2 (L2) Bridging, a specific set of data must be gathered at the time of the event. This article details what documentation is required and how to gather it prior to opening a support request with Broadcom.

Environment

VMware NSX

Cause

When it comes to troubleshooting NSX Layer 2 (L2) Bridging, there are several layers of troubleshooting involved. The purpose of this troubleshooting article is to list them so as to aid such troubleshooting.

Resolution

Related documentation:

 

Edge Bridge - everything you want to know Link to document
Bridge Product Docs Link to document

 

If you are contacting Broadcom support about this issue, please provide the following:

  • NSX Edge log bundles for all Edges in the Edge Cluster containing the L2 Bridge
  • Ensure log date range covers the full date of the event(s) being investigated. When in doubt, retrieve logs for all time.
  • NSX Manager log bundles
  • ESXi host log bundles for all hosts where the affected Edge VMs are running
  • Text of any error messages seen in NSX GUI or command lines pertinent to the investigation

 

 

Additional Information

Here are some troubleshooting steps that can be performed:

  1. Collect initial information about the problem with the L2 Bridge:
    1. Is this an initial installation assistance request? If so, please refer the customer to VMware Professional Services.
    2. What is the purpose of the bridge? Is this a temporary solution for workload migration, or a permanent solution to connect physical and virtual?
    3. What networks are being connected across the bridge? Determine what VLAN ID or VLAN ID range to VNI mappings are done by the bridge. 
    4. What NSX-T Edges are used in the bridge and on what ESXi hosts do they reside? Are the hosts prepared for NSX-T/NSX-V, managed by vSphere? What bridge profile and edge cluster is used?
    5. Which of the three options available was used to build the bridge? i.e. MAC Learning, Sink Port, Promiscuous mode. Refer to Configure Edge-Based Bridging for details on each option.
  2. Determine the symptoms of the problem:
    1. Connectivity issues: 
      1. When did the problem start and what recent changes have been made?
      2. What ping test can we use to validate the connectivity? Make sure no gateway firewall or DFW is blocking this traffic. 
      3. Examine the MAC address table of the Logical Switches in question and see if the entries seem stale. Stale entries can be flushed by disconnecting and re-connecting the logical switch to it's T1.
    2. Performance issues across the bridge:
      1. What is the expected performance across the bridge? 
      2.  If promiscuous mode is enabled, there might be potential performance issues because the traffic is replicated to all VMs attached to the portgroup in the host. 
  3. In the UI, verify the configuration of the edge
    1. Check the bridge profile: System > Fabric > Profiles. Determine what edge cluster is being used and what edge is active. Also take note of the failover type.
    2. Check the Segment that will use the bridge configuration. Make sure that the bridge profile, bridge VLAN Transport Zone and the VLAN ID or range of VLAN IDs is configured properly.
  4. In vCenter UI, check the dvPortGroup connected to the uplink vNIC of the edge. 
    1. Make sure that forged transmits and MAC Learning is allowed.
    2. The dvPortGroup shows as VLAN Trunk.
    3. If using promiscuous mode, make sure it is enabled. 
    4. Do not have other port groups in promiscuous mode on the same host sharing the same set of VLANs.
  5. In the Edge CLI:
    1. Run "get bridge summary": To get the UUID of the bridge, HA status on the local edge , VLAN and Segment being bridged. Make sure that the active edge is forwarding. 
    2. Get bridge <UUID> / get bridge name <segment name> / get bridge vlan <VLAN ID>: To display the information of a particular bridge. 
  6. In the ESXi host where the edges reside if using promiscuous mode:
    1. Run "esxcli system settings advanced list -o /Net/ReversePathFwdCheckPromisc", confirm that "Int Value" is set to 1. 
    2. Otherwise, use "esxcli system settings advanced set -o /Net/ReversePathFwdCheckPromisc -i 1", then disable and enable promiscuous mode on the DVPG to which the bridge vNIC is connected as described in Option 1.
    3. Use "vsish -e cat /net/portsets/{edge_vm_portset}/ports/{edge_vm_port}/outputstats" to make sure the Reverse Path filter is applied.
  7. Collect simultaneous packet captures on the source, destination, L2 Bridge Edge and ESXi hosts that contain them.  
    1. On the edge: 
      1. start capture interface <T1 SR uplink/backplane/downlink interface UUID> <expression>
    2. ESXi hosts: 
      1. pktcap-uw --switchport <port#> --capture VnicTx,VnicRx --snaplen 150 --ng -o /<directory>/<name>.pcap
      2. pktcap-uw --uplink <vmnic#> --capture UplinkSndKernel,UplinkRcvKernel --snaplen 150 --ng -o /<directory>/<name>.pcap