General recommendations to troubleshoot NSX-V L2 Bridge connectivity issues.
VMware NSX Data Center for vSphere
1. Verify the L2 Bridge configuration. In this example, the bridge is created between the logical switch "L2_Bridge" and the distributed port group "Test_VLAN_10_L2_Bridge":
1.1 Besides the NSX Side configuration, L2 Bridge config requires a VLAN backed vDS:
2. Collect the info of the VMs you are using to troubleshoot on the VXLAN and VLAN sides of the bridge
VLAN VM
MAC: 00:0c:29:##:##:##
IP: #.#.#.222
VXLAN VM
MAC: 00:50:56:##:##:##
IP: #.#.#.200
3.. Bridge instance information:
3.1 On the host where the DLR Control VM is running, get the list of DLRs:
[root@esxi3:~] net-vdr --instance -l
VDR Instance Information :
-------------------------
Vdr Name: default+edge-2
Vdr Id: 0x00002710
Number of Lifs: 3
Number of Routes: 1
Number of Hold Pkts: 0
Number of Neighbors: 1
State: Enabled
Controller IP: 10.1.1.250
Control Plane IP: 10.1.1.130
Control Plane Active: Yes
Num unique nexthops: 0
Generation Number: 0
Edge Active: Yes
Pmac: 00:00:00:00:00:00
3.2. From the previous command, get the name of the affected edge (default+X) and run the following command to get the designated instance IP (DI IP):
[root@esxi3:~] net-vdr --lif -l default+edge-2
VDR default+edge-2 LIF Information :
Name: 27100000000c
Mode: Bridging, Sedimented, Internal
Id: Vlan:10
Ip(Mask): 0.0.0.0(0.0.0.0)
Connected Dvs: Management
Designated Instance: No
DI IP: 0.0.0.0
State: Enabled
Flags: 0xd4
DHCP Relay: Not enabled
Name: 27100000000b
Mode: Bridging, Sedimented, Internal
Id: Vxlan:10012
Ip(Mask): 0.0.0.0(0.0.0.0)
Connected Dvs: Management
VXLAN Control Plane: Enabled
VXLAN Multicast IP: 0.0.0.1
State: Enabled
Flags: 0x22d4
DHCP Relay: Not enabled
Name: 27100000000a
Mode: Routing, Distributed, Internal
Id: Vxlan:10000
Ip(Mask): #.#.#.150(255.255.255.0)
Connected Dvs: Management
VXLAN Control Plane: Enabled
VXLAN Multicast IP: 0.0.0.1
State: Enabled
Flags: 0x2288
DHCP Relay: Not enabled
On the VLAN interface, you will see a "bridge designated instance IP" (DI IP). The designated instance is one ESXi in the cluster that is chosen to communicate with the physical world. If the "DI IP" is 0.0.0.0 the bridge designated instance is located on the ESXi host where the DLR control VM is running.
[root@esxi3:~] net-vdr -b --mac default+edge-2
VDR 'default+edge-2' bridge 'test' mac address tables :
total number of MAC addresses: 1
number of MAC addresses returned: 1
Destination Address Address Type VLAN ID VXLAN ID Destination Port Age
------------------- ------------ ------- -------- ---------------- ---
00:50:56:##:##:## Dynamic 0 10012 67108874 0
total number of MAC addresses: 2
number of MAC addresses returned: 2
Destination Address Address Type VLAN ID VXLAN ID Destination Port Age
------------------- ------------ ------- -------- ---------------- ---
00:0c:29:##:##:## Dynamic 10 0 67108868 0
[root@esxi3:~] net-vdr -b --stats default+edge-2
VDR 'default+edge-2' bridge 'test' stats :
Bridge stats:
portNotExist: 0
Network 'vxlan-10012-type-bridging' stats:
fdbHit: 2025
fdbLearn: 1
fdbUpdate: 0
fdbTableFull: 0
fdbChain: 0
fdbAged: 0
fdbMacMoved: 0
fdbMacHit: 2025
FRPFilterLeafTx: 0
FRPFilterBridged: 0
fdbUplinkFilter: 0
Network port ID '0x4000008' stats:
pktsTx: 2093
pktsTxMulticast: 0
pktsTxBroadcast: 3
pktsRx: 2092
pktsRxMulticast: 0
pktsRxBroadcast: 0
droppedTx: 0
droppedRx: 0
mappedLenTooShort: 0
pktsBridged: 2093
pktsDroppedBridged: 0
pktsDroppedUplink: 0
droppedTxPortMismatch: 0
droppedTxVxlanPktToVlan: 0
Network 'vlan-10-type-bridging' stats:
fdbHit: 2090
fdbLearn: 3
fdbUpdate: 0
fdbTableFull: 0
fdbChain: 0
fdbAged: 1
fdbMacMoved: 0
fdbMacHit: 2090
FRPFilterLeafTx: 0
FRPFilterBridged: 0
fdbUplinkFilter: 0
Network port ID '0x4000008' stats:
pktsTx: 2092
pktsTxMulticast: 0
pktsTxBroadcast: 0
pktsRx: 2093
pktsRxMulticast: 0
pktsRxBroadcast: 3
droppedTx: 0
droppedRx: 0
mappedLenTooShort: 0
pktsBridged: 2092
pktsDroppedBridged: 0
pktsDroppedUplink: 0
droppedTxPortMismatch: 0
droppedTxVxlanPktToVlan: 0
4. Traffic captures examples:
4.1 I dentify the uplink in use by the vDS port group associated with the L2 Bridge on the host where the L2 Bridge designated instance is located. You can do this by going to vCenter / Networking / Click on the vDS on the inventory / Click on Configure / Click on Topology / Click on the vDS port group. You will see an orange line going from the port group to the Uplinks section. In case you are using link aggregation, you need to capture on one uplink at the time to identify which is being used to forwared the VLAN tagged traffic to the physical network.
4.2 Traffic capture example of the frames leaving the host towards the physical network
[root@esxi3:~] pktcap-uw --uplink vmnic2 --vlan 10 --dir 1 -o - | tcpdump-uw -enr -
reading from file -, link-type EN10MB (Ethernet)
15:13:29.172512 00:50:56:##:##:## > 00:0c:29:##:##:##, ethertype IPv4 (0x0800), length 98: #.#.#.200 >#.#.#.222: ICMP echo reply, id 9364, seq 14, length 64
15:13:30.173055 00:50:56:##:##:## > 00:0c:29:##:##:##, ethertype IPv4 (0x0800), length 98: #.#.#.200 > #.#.#.222: ICMP echo reply, id 9364, seq 15, length 64
4.3 You can use the pktcap filters "--ip #.#.#.#". If you use dir "0" you will be capturing in the inbound direction. To see a full list of pktcap-uw filters use "pktcap-uw --help | less".
5. Make sure the ESXi host where the DI is running has connectivity to the controller on port 1234
[root@esxi3:~] esxcli network ip connection list | grep 1234
tcp 0 0 #.#.#.130:38370 #.#.#.250:1234 ESTABLISHED 35109 newreno netcpa-worker
* The controller is in charge of building the Bridge MAC table and distributing that information to the ESXi hosts.
* In this lab there was only one controller deployed but in production environments the recommendation is to have 3 NSX Controllers.
In case you find controller issues, please refer to the following link: Troubleshooting NSX Controller
9.1 - Please execute the following command on the host where the control VM of the DLR is running:
esxtop -b -d 2 -n 2 -a > esxtop.csv
This command will generate a file called "esxtop.csv" in the directory where it's executed. The idea is discard performance issue on the ESXi host where the DLR control VM is running.
9.2 Logs of the ESXi host where the control VM of the affected DLR is running. Command "vm-support" via CLI.
9.3 Search the DLR Control VM in the vCenter inventory and click on it, go to Monitor / Task & Events / Tasks / Copy / Paste the output in a txt document. Please do the same for the "Events" section. The idea of this is dicarad that the DLR control VM is being backed up using snapshots which could damage the VM OS and lead to unexpected behaviors.