NSX-V L2 Bridge troubleshooting
search cancel

NSX-V L2 Bridge troubleshooting

book

Article ID: 330232

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

General recommendations to troubleshoot NSX-V L2 Bridge connectivity issues.

Environment

VMware NSX Data Center for vSphere

Resolution

1. Verify the L2 Bridge configuration. In this example, the bridge is created between the logical switch "L2_Bridge" and the distributed port group "Test_VLAN_10_L2_Bridge":

 



1.1 Besides the NSX Side configuration, L2 Bridge config requires a VLAN backed vDS:

 

2. Collect the info of the VMs you are using to troubleshoot on the VXLAN and VLAN sides of the bridge

VLAN VM
MAC: 00:0c:29:##:##:##
IP: #.#.#.222

VXLAN VM
MAC: 00:50:56:##:##:##
IP: #.#.#.200


3.. Bridge instance information:

3.1 On the host where the DLR Control VM is running, get the list of DLRs:

 

[root@esxi3:~] net-vdr --instance -l

 

VDR Instance Information :

-------------------------

Vdr Name:                   default+edge-2
Vdr Id:                     0x00002710
Number of Lifs:             3
Number of Routes:           1
Number of Hold Pkts:        0
Number of Neighbors:        1

State:                      Enabled

Controller IP:              10.1.1.250
Control Plane IP:           10.1.1.130
Control Plane Active:       Yes
Num unique nexthops:        0
Generation Number:          0
Edge Active:                Yes
Pmac:                       00:00:00:00:00:00

 

3.2. From the previous command, get the name of the affected edge (default+X) and run the following command to get the designated instance IP (DI IP):

 

[root@esxi3:~] net-vdr --lif -l default+edge-2


VDR default+edge-2 LIF Information :

Name:                27100000000c
Mode:                Bridging, Sedimented, Internal
Id:                  Vlan:10
Ip(Mask):            0.0.0.0(0.0.0.0)
Connected Dvs:       Management
Designated Instance: No
DI IP:               0.0.0.0
State:               Enabled
Flags:               0xd4
DHCP Relay:          Not enabled


Name:                27100000000b
Mode:                Bridging, Sedimented, Internal
Id:                  Vxlan:10012
Ip(Mask):            0.0.0.0(0.0.0.0)
Connected Dvs:       Management
VXLAN Control Plane: Enabled
VXLAN Multicast IP:  0.0.0.1
State:               Enabled
Flags:               0x22d4
DHCP Relay:          Not enabled

 

Name:                27100000000a
Mode:                Routing, Distributed, Internal
Id:                  Vxlan:10000
Ip(Mask):            #.#.#.150(255.255.255.0)
Connected Dvs:       Management
VXLAN Control Plane: Enabled
VXLAN Multicast IP:  0.0.0.1
State:               Enabled
Flags:               0x2288
DHCP Relay:          Not enabled


On the VLAN interface, you will see a "bridge designated instance IP" (DI IP). The designated instance is one ESXi in the cluster that is chosen to communicate with the physical world. If the "DI IP" is 0.0.0.0 the bridge designated instance is located on the ESXi host where the DLR control VM is running.

3.3 Bridge MAC table. With the following command, you can see the list of VMs that the bridge has learned as part of the VXLAN side and the VLAN side, make sure the bridge is learning the MACs of the VMs you are using for testing:


[root@esxi3:~] net-vdr -b --mac default+edge-2

VDR 'default+edge-2' bridge 'test' mac address tables :

total number of MAC addresses:    1

number of MAC addresses returned: 1

Destination Address  Address Type  VLAN ID  VXLAN ID  Destination Port  Age
-------------------  ------------  -------  --------  ----------------  ---
00:50:56:##:##:##   Dynamic             0     10012          67108874  0

total number of MAC addresses:    2

number of MAC addresses returned: 2

Destination       Address   Address Type  VLAN ID  VXLAN ID  Destination Port  Age
-------------------  ------------  -------  --------  ----------------  ---
00:0c:29:##:##:##              Dynamic            10             0          67108868  0

 

3.4  L2 Bridge stats. Look for the "pktsDroppedBridged", "droppedTx" and "dropedRx" statistics:

 

[root@esxi3:~] net-vdr -b --stats default+edge-2

VDR 'default+edge-2' bridge 'test' stats :
Bridge stats:
portNotExist:       0

        Network 'vxlan-10012-type-bridging' stats:

        fdbHit:           2025
        fdbLearn:         1
        fdbUpdate:        0
        fdbTableFull:     0
        fdbChain:         0
        fdbAged:          0
        fdbMacMoved:      0
        fdbMacHit:        2025
        FRPFilterLeafTx:  0
        FRPFilterBridged: 0
        fdbUplinkFilter:  0

 

                Network port ID '0x4000008' stats:

                pktsTx:                  2093
                pktsTxMulticast:         0
                pktsTxBroadcast:         3
                pktsRx:                  2092
                pktsRxMulticast:         0
                pktsRxBroadcast:         0
                droppedTx:               0
                droppedRx:               0
                mappedLenTooShort:       0
                pktsBridged:             2093
                pktsDroppedBridged:      0
                pktsDroppedUplink:       0
                droppedTxPortMismatch:   0
                droppedTxVxlanPktToVlan: 0

        Network 'vlan-10-type-bridging' stats:

        fdbHit:           2090
        fdbLearn:         3
        fdbUpdate:        0
       fdbTableFull:     0
        fdbChain:         0
        fdbAged:          1
        fdbMacMoved:      0
        fdbMacHit:        2090
        FRPFilterLeafTx:  0
        FRPFilterBridged: 0
        fdbUplinkFilter:  0

                Network port ID '0x4000008' stats:

                pktsTx:                  2092
                pktsTxMulticast:         0
                pktsTxBroadcast:         0
                pktsRx:                  2093
                pktsRxMulticast:         0
                pktsRxBroadcast:         3
                droppedTx:               0
                droppedRx:               0
                mappedLenTooShort:       0
                pktsBridged:             2092
                pktsDroppedBridged:      0
                pktsDroppedUplink:       0
                droppedTxPortMismatch:   0
                droppedTxVxlanPktToVlan: 0

 

4. Traffic captures examples:


4.1 I dentify the uplink in use by the vDS port group associated with the L2 Bridge on the host where the L2 Bridge designated instance is located. You can do this by going to vCenter / Networking / Click on the vDS on the inventory / Click on Configure / Click on Topology / Click on the vDS port group. You will see an orange line going from the port group to the Uplinks section. In case you are using link aggregation, you need to capture on one uplink at the time to identify which is being used to forwared the VLAN tagged traffic to the physical network.


4.2 Traffic capture example of the frames leaving the host towards the physical network
 

[root@esxi3:~] pktcap-uw --uplink vmnic2 --vlan 10 --dir 1 -o - | tcpdump-uw -enr -

reading from file -, link-type EN10MB (Ethernet)

15:13:29.172512 00:50:56:##:##:## > 00:0c:29:##:##:##, ethertype IPv4 (0x0800), length 98: #.#.#.200 >#.#.#.222: ICMP echo reply, id 9364, seq 14, length 64

15:13:30.173055 00:50:56:##:##:## > 00:0c:29:##:##:##, ethertype IPv4 (0x0800), length 98: #.#.#.200 > #.#.#.222: ICMP echo reply, id 9364, seq 15, length 64


4.3 You can use the pktcap filters "--ip #.#.#.#". If you use dir "0" you will be capturing in the inbound direction.  To see a full list of pktcap-uw filters use "pktcap-uw --help | less".

5. Make sure the ESXi host where the DI is running has connectivity to the controller on port 1234

[root@esxi3:~] esxcli network ip connection list | grep 1234

tcp         0       0  #.#.#.130:38370                #.#.#.250:1234   ESTABLISHED     35109  newreno  netcpa-worker

* The controller is in charge of building the Bridge MAC table and distributing that information to the  ESXi hosts.
* In this lab there was only one controller deployed but in production environments the recommendation is to have 3 NSX Controllers.


6. Controller MAC table. You can verify the bridge mac table on the controller side. Notice the vdr ID was taken from the command "net-vdr --instance -l":

In case you find controller issues, please refer to the following link: Troubleshooting NSX Controller
 

7. Discard VXLAN issues over the physical network by migrating a VXLAN VM to the same ESXi host where the DI of the DLR is running.

8. Always make sure the physical NICs are running the recommended driver/fimrware as per VMware's HCL.
 
9. If the issue persists, collect diagnostic info for further analysis 

 

9.1 - Please execute the following command on the host where the control VM of the DLR is running:

esxtop -b -d 2 -n 2 -a > esxtop.csv

This command will generate a file called "esxtop.csv" in the directory where it's executed. The idea is discard performance issue on the ESXi host where the DLR control VM is running.
 

9.2 Logs of the ESXi host where the control VM of the affected DLR is running. Command "vm-support" via CLI.
 

9.3 Search the DLR Control VM in the vCenter inventory and click on it, go to Monitor / Task & Events / Tasks / Copy / Paste the output in a txt document. Please do the same for the "Events" section. The idea of this is dicarad that the DLR control VM is being backed up using snapshots which could damage the VM OS and lead to unexpected behaviors.