General recommendations for troubleshooting VMware NSX Data Center for vSphere L2 Bridge connectivity issues.
VMware NSX Data Center for vSphere
1. Verify the L2 Bridge configuration. In this example, the bridge is created between the logical switch "L2_Bridge" and the distributed port group "Test_VLAN_10_L2_Bridge":
1.1 Besides the NSX Side configuration, L2 Bridge config requires a VLAN backed vDS:
2. Collect the info of the VMs you are using to troubleshoot on the VXLAN and VLAN sides of the bridge
VLAN VM
MAC: 00:0c:29:##:##:##
IP: #.#.#.222
VXLAN VM
MAC: 00:50:56:##:##:##
IP: #.#.#.200
3. Bridge instance information:
3.1 On the host where the DLR Control VM is running, get the list of DLRs:
[root@esxi3:~] net-vdr --instance -l
VDR Instance Information :
-------------------------
Vdr Name: default+edge-2Vdr Id: 0x00002710Number of Lifs: 3Number of Routes: 1Number of Hold Pkts: 0Number of Neighbors: 1
State: Enabled
Controller IP: 10.#.#.250Control Plane IP: 10.#.#.130Control Plane Active: YesNum unique nexthops: 0Generation Number: 0Edge Active: YesPmac: 00:00:00:00:00:00
3.2. From the previous command, get the name of the affected edge (default+X) and run the following command to get the designated instance IP (DI IP):
[root@esxi3:~] net-vdr --lif -l default+edge-2
VDR default+edge-2 LIF Information :
Name: 27100000000cMode: Bridging, Sedimented, InternalId: Vlan:10Ip(Mask): 0.0.0.0(0.0.0.0)Connected Dvs: ManagementDesignated Instance: NoDI IP: 0.0.0.0State: EnabledFlags: 0xd4DHCP Relay: Not enabled
Name: 27100000000bMode: Bridging, Sedimented, InternalId: Vxlan:10012Ip(Mask): 0.0.0.0(0.0.0.0)Connected Dvs: ManagementVXLAN Control Plane: EnabledVXLAN Multicast IP: 0.0.0.1State: EnabledFlags: 0x22d4DHCP Relay: Not enabled
Name: 27100000000aMode: Routing, Distributed, InternalId: Vxlan:10000Ip(Mask): #.#.#.150(255.255.255.0)Connected Dvs: ManagementVXLAN Control Plane: EnabledVXLAN Multicast IP: 0.0.0.1State: EnabledFlags: 0x2288DHCP Relay: Not enabled
On the VLAN interface, you will see a "bridge designated instance IP" (DI IP). The designated instance is one ESXi in the cluster that is chosen to communicate with the physical world. If the "DI IP" is 0.0.0.0 the bridge designated instance is located on the ESXi host where the DLR control VM is running.
[root@esxi3:~] net-vdr -b --mac default+edge-2
VDR 'default+edge-2' bridge 'test' mac address tables :
total number of MAC addresses: 1
number of MAC addresses returned: 1
Destination Address Address Type VLAN ID VXLAN ID Destination Port Age------------------- ------------ ------- -------- ---------------- ---00:50:56:##:##:## Dynamic 0 10012 67108874 0
total number of MAC addresses: 2
number of MAC addresses returned: 2
Destination Address Address Type VLAN ID VXLAN ID Destination Port Age------------------- ------------ ------- -------- ---------------- ---00:0c:29:##:##:## Dynamic 10 0 67108868 0
"pktsDroppedBridged", "droppedTx" and "dropedRx" statistics:
[root@esxi3:~] net-vdr -b --stats default+edge-2
VDR 'default+edge-2' bridge 'test' stats :Bridge stats:portNotExist: 0
Network 'vxlan-10012-type-bridging' stats:
fdbHit: 2025 fdbLearn: 1 fdbUpdate: 0 fdbTableFull: 0 fdbChain: 0 fdbAged: 0 fdbMacMoved: 0 fdbMacHit: 2025 FRPFilterLeafTx: 0 FRPFilterBridged: 0 fdbUplinkFilter: 0
Network port ID '0x4000008' stats:
pktsTx: 2093 pktsTxMulticast: 0 pktsTxBroadcast: 3 pktsRx: 2092 pktsRxMulticast: 0 pktsRxBroadcast: 0 droppedTx: 0 droppedRx: 0 mappedLenTooShort: 0 pktsBridged: 2093 pktsDroppedBridged: 0 pktsDroppedUplink: 0 droppedTxPortMismatch: 0 droppedTxVxlanPktToVlan: 0
Network 'vlan-10-type-bridging' stats:
fdbHit: 2090 fdbLearn: 3 fdbUpdate: 0 fdbTableFull: 0 fdbChain: 0 fdbAged: 1 fdbMacMoved: 0 fdbMacHit: 2090 FRPFilterLeafTx: 0 FRPFilterBridged: 0 fdbUplinkFilter: 0
Network port ID '0x4000008' stats:
pktsTx: 2092 pktsTxMulticast: 0 pktsTxBroadcast: 0 pktsRx: 2093 pktsRxMulticast: 0 pktsRxBroadcast: 3 droppedTx: 0 droppedRx: 0 mappedLenTooShort: 0 pktsBridged: 2092 pktsDroppedBridged: 0 pktsDroppedUplink: 0 droppedTxPortMismatch: 0 droppedTxVxlanPktToVlan: 0
4. Traffic captures examples:
4.1 I dentify the uplink in use by the vDS port group associated with the L2 Bridge on the host where the L2 Bridge designated instance is located. You can do this by going to vCenter / Nction. In case you are using link aggregation, you need to capture on one uplink at the time to identify which is being used to forwared the VLAN tagged traffic to the physical network.
etworking / Click on the vDS on the inventory / Click on Configure / Click on Topology / Click on the vDS port group. You will see an orange line going from the port group to the Uplinks se
4.2 Traffic capture example of the frames leaving the host towards the physical network
[root@esxi3:~] pktcap-uw --uplink vmnic2 --vlan 10 --dir 1 -o - | tcpdump-uw -enr -
reading from file -, link-type EN10MB (Ethernet)
15:13:29.172512 00:50:56:##:##:## > 00:0c:29:##:##:##, ethertype IPv4 (0x0800), length 98: #.#.#.200 >#.#.#.222: ICMP echo reply, id 9364, seq 14, length 64
15:13:30.173055 00:50:56:##:##:## > 00:0c:29:##:##:##, ethertype IPv4 (0x0800), length 98: #.#.#.200 > #.#.#.222: ICMP echo reply, id 9364, seq 15, length 64
4.3 You can use the pktcap filters "--ip #.#.#.#". If you use dir "0" you will be capturing in the inbound direction. To see a full list of pktcap-uw filters use "pktcap-uw --help | less".
5. Make sure the ESXi host where the DI is running has connectivity to the controller on port 1234
[root@esxi3:~] esxcli network ip connection list | grep 1234
tcp 0 0 #.#.#.130:38370 #.#.#.250:1234 ESTABLISHED 35109 newreno netcpa-worker
* The controller is in charge of building the Bridge MAC table and distributing that information to the ESXi hosts.
* In this lab there was only one controller deployed but in production environments the recommendation is to have 3 NSX Controllers.
9.1 - Please execute the following command on the host where the control VM of the DLR is running:
[root@esxi:~] esxtop -b -d 2 -n 2 -a > esxtop.csv
This command will generate a file called "esxtop.csv" in the directory where it's executed. The idea is discard performance issue on the ESXi host where the DLR control VM is running.
9.2 Logs of the ESXi host where the control VM of the affected DLR is running. Command "vm-support" via CLI.
9.3 Search the DLR Control VM in the vCenter inventory and click on it, go to Monitor / Task & Events / Tasks / Copy / Paste the output in a txt document. Please do the same for the "Events" section. The idea of this is dicarad that the DLR control VM is being backed up using snapshots which could damage the VM OS and lead to unexpected behaviors.