Troubleshooting VMware NSX-T using Packet Captures
search cancel

Troubleshooting VMware NSX-T using Packet Captures

book

Article ID: 345925

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

This article provides instructions for troubleshooting VMware NSX using packet captures. Use this information to identify the traffic-drop on virtual components including East-West and North-South Traffic.

Environment

VMware NSX-T Data Center 3.x
VMware NSX 4.x

Resolution

For East-West Traffic

Perform the following steps in both the Source and destination of the VMware ESXi host.

  1. Identify the switchport and uplink, where traffic flows for the VM.
    1. Login to ESXi hosts where the affected VM is configured.
    2. Commands to pull Switch port info:esxcli network vm list  -->
      (Copy the world ID and place it in port number section below)
      esxcli network vm port list -w <port number>"Port ID:" and "Team Uplink:" gives the respective switchport and uplink.
  2. After the Switchport and vmnic is identified, use the below commands for captures:

To see live traffic

pktcap-uw --switchport <switchport-id> --capture VnicTx,VnicRx -o- | tcpdump-uw -r - -nne
pktcap-uw --uplink <vmnic_number> --capture UplinkSndKernel,UplinkRcvKernel   -o- | tcpdump-uw -r - -nne 
 
Ctrl+C to stop the capture
 
Note:
  • UplinkRcvKernel -- The function that receives packets from uplink dev at kernel side
  • UplinkSndKernel -- Function to Tx packets on uplink at kernel side
  • VnicTx -- Function in vnic backend to Tx packets from guest
  • VnicRx -- Function in vnic backend to Rx packets to guest
     

To capture and write on pcap

pktcap-uw --switchport <switchport-id> --dir 2 -o /vmfs/volumes/<datastore_name>/switchport _capture.pcap  
pktcap-uw --uplink <vmnic_no> --capture UplinkSndKernel -o /vmfs/volumes/<datastore_name>/UplinkSndKernel-vmnic-esxi.pcap & pktcap-uw --uplink <vmnic_no> --capture UplinkRcvKernel -o /vmfs/volumes/<datastore_name>/UplinkRcdKernel-vmnic-esxi.pcap


To Kill the Capture Process

kill $(lsof |grep pktcap-uw |awk '{print $1}'| sort -u)

Note: Change the vmnic & IP as applicable based on situation and VM.

Sample:

Source IP: 10.0.##.##
Destination IP: 10.1.##.##
Source ESXi: <Src Esxi Name>
Destination ESXi: <Dest Esxi Name>

[root:~] esxcli network vm list | grep -i VM-1
  287042  VM-1          1

[root:~] esxcli network vm list
World ID  Name  Num Ports  Networks
--------  ----  ---------  --------
  287042  VM-1          1

[root@<Src Esxi Name>:~] esxcli network vm port list -w 287042
   Port ID: <port ID>
   vSwitch: <vSwitch Name>
   Portgroup:
   DVPort ID: 5ea2####-####-####-####-########5383
   MAC Address: MAC_ADDR_1
   IP Address: 0.0.0.0
   Team Uplink: vmnic0
   Uplink Port ID: <uplink port ID>
   Active Filters: vmware-sfw

[root@<Src Esxi Name>:~] pktcap-uw --switchport <port ID> --capture VnicTx,VnicRx -o- | tcpdump-uw -r - -nne
The switch port id is 0x04####10.
The session capture point is VnicTx,VnicRx.
pktcap: The output file is -.
pktcap: No server port specifed, select 27887 as the port.
pktcap: Local CID 2.
pktcap: Listen on port 27887.
pktcap: Main thread: 287048629056.
pktcap: Dump Thread: 287049164544.
pktcap: Recv Thread: 287049692928.
pktcap: Accept...
pktcap: Vsock connection from port 1028 cid 2.
reading from file -, link-type EN10MB (Ethernet)
05:28:17.947352 [MAC_ADDR_1] > [MAC_ADDR_2], ethertype IPv4 (0x0800), length 98: 10.0.##.## > 10.1.##.##: ICMP echo request, id 32771, seq 1026, length 64
05:28:17.947733 [MAC_ADDR_2] > [MAC_ADDR_1], ethertype IPv4 (0x0800), length 98: 10.1.##.## > 10.0.##.##: ICMP echo reply, id 32771, seq 1026, length 64
05:28:18.947514 [MAC_ADDR_1] > [MAC_ADDR_2], ethertype IPv4 (0x0800), length 98: 10.0.##.## > 10.1.##.##: ICMP echo request, id 32771, seq 1027, length 64
05:28:18.948003 [MAC_ADDR_2] > [MAC_ADDR_1], ethertype IPv4 (0x0800), length 98: 10.1.##.## > 10.0.##.##: ICMP echo reply, id 32771, seq 1027, length 64


[root@<Src Esxi Name>:~] pktcap-uw --uplink vmnic0 --capture UplinkSndKernel,UplinkRcvKernel -o- | tcpdump-uw -r - -nne | grep -i 10.1.##.##
The name of the uplink is vmnic0.
The session capture point is UplinkSndKernel,UplinkRcvKernel.
pktcap: The output file is -.
pktcap: No server port specifed, select 27896 as the port.
pktcap: Local CID 2.
pktcap: Listen on port 27896.
pktcap: Main thread: 419002739520.
pktcap: Dump Thread: 419003275008.
reading from file -, link-type EN10MB (Ethernet)
pktcap: Recv Thread: 419003803392.
pktcap: Accept...
pktcap: Vsock connection from port 1029 cid 2.
05:28:17.947409 [SRC_TEP_MAC_ADDR] > [DST_TEP_MAC_ADDR], ethertype 802.1Q (0x8100), length 160: vlan 141, p 0, ethertype IPv4, <Src_TEP_IP>.61837 > <Dest_TEP_IP>.6081: Geneve, Flags [C], vni 0x11800, proto TEB (0x6558), options [8 bytes]: [MAC_ADDR_1] > [MAC_ADDR_2], ethertype IPv4 (0x0800), length 98: 10.0.##.## > 10.1.##.##: ICMP echo request, id 32771, seq 1026, length 64
05:28:17.947716 [SRC_TEP_MAC_ADDR] > [DST_TEP_MAC_ADDR], ethertype 802.1Q (0x8100), length 160: vlan 141, p 0, ethertype IPv4, <Dest_TEP_IP>.50998 > <Src_TEP_IP>.6081: Geneve, Flags [C], vni 0x10802, proto TEB (0x6558), options [8 bytes]: [MAC_ADDR_2] > [MAC_ADDR_1], ethertype IPv4 (0x0800), length 98: 10.1.##.## > 10.0.##.##: ICMP echo reply, id 32771, seq 1026, length 64
05:28:18.947600 [SRC_TEP_MAC_ADDR] > [DST_TEP_MAC_ADDR], ethertype 802.1Q (0x8100), length 160: vlan 141, p 0, ethertype IPv4, <Src_TEP_IP>.61837 > <Dest_TEP_IP>.6081: Geneve, Flags [C], vni 0x11800, proto TEB (0x6558), options [8 bytes]: [MAC_ADDR_1] > [MAC_ADDR_2], ethertype IPv4 (0x0800), length 98: 10.0.##.## > 10.1.##.##: ICMP echo request, id 32771, seq 1027, length 64
05:28:18.947968 [SRC_TEP_MAC_ADDR] > [DST_TEP_MAC_ADDR], ethertype 802.1Q (0x8100), length 160: vlan 141, p 0, ethertype IPv4, <Dest_TEP_IP>.50998 > <Src_TEP_IP>.6081: Geneve, Flags [C], vni 0x10802, proto TEB (0x6558), options [8 bytes]: [MAC_ADDR_2] > [MAC_ADDR_1], ethertype IPv4 (0x0800), length 98: 10.1.##.## > 10.0.##.##: ICMP echo reply, id 32771, seq 1027, length 64

[root@<Dest Esxi name>:~] esxcli network vm list | grep -i VM-2
  286924  VM-2          1

[root@<Dest Esxi name>:~] esxcli network vm port list -w 286924
   Port ID: <port ID>
   vSwitch: <vSwitch_Name>
   Portgroup:
   DVPort ID: <port ID>
   MAC Address: MAC_ADDR_2
   IP Address: 0.0.0.0
   Team Uplink: vmnic0
   Uplink Port ID: 2214592517
   Active Filters: vmware-sfw


[root@<Dest Esxi name>:~] pktcap-uw --uplink vmnic0 --capture UplinkSndKernel,UplinkRcvKernel -o- | tcpdump-uw -r - -nne | grep -i 10.1.##.##
The name of the uplink is vmnic0.
The session capture point is UplinkSndKernel,UplinkRcvKernel.
pktcap: The output file is -.
pktcap: No server port specifed, select 28035 as the port.
pktcap: Local CID 2.
pktcap: Listen on port 28035.
pktcap: Main thread: 603350494016.
pktcap: Dump Thread: 603351029504.
reading from file -, link-type EN10MB (Ethernet)
pktcap: Recv Thread: 603351557888.
pktcap: Accept...
pktcap: Vsock connection from port <port ID> cid 2.
05:28:17.948445 [SRC_TEP_MAC_ADDR] > [DST_TEP_MAC_ADDR], ethertype 802.1Q (0x8100), length 160: vlan 141, p 0, ethertype IPv4, <Src_TEP_IP>.61837 > <Dest_TEP_IP>.6081: Geneve, Flags [C], vni 0x11800, proto TEB (0x6558), options [8 bytes]: [MAC_ADDR_1] > [MAC_ADDR_2], ethertype IPv4 (0x0800), length 98: 10.0.##.## > 10.1.##.##: ICMP echo request, id 32771, seq 1026, length 64
05:28:17.948565 [DST_TEP_MAC_ADDR] > [SRC_TEP_MAC_ADDR], ethertype 802.1Q (0x8100), length 160: vlan 141, p 0, ethertype IPv4,  <Dest_TEP_IP>.50998 > <Src_TEP_IP>.6081: Geneve, Flags [C], vni 0x10802, proto TEB (0x6558), options [8 bytes]: [MAC_ADDR_2] > [MAC_ADDR_1], ethertype IPv4 (0x0800), length 98: 10.1.##.## > 10.0.##.##: ICMP echo reply, id 32771, seq 1026, length 64
05:28:18.948662 [SRC_TEP_MAC_ADDR] > [DST_TEP_MAC_ADDR], ethertype 802.1Q (0x8100), length 160: vlan 141, p 0, ethertype IPv4, <Src_TEP_IP>.61837 >  <Dest_TEP_IP>.6081: Geneve, Flags [C], vni 0x11800, proto TEB (0x6558), options [8 bytes]: [MAC_ADDR_1] > [MAC_ADDR_2], ethertype IPv4 (0x0800), length 98: 10.0.##.## > 10.1.##.##: ICMP echo request, id 32771, seq 1027, length 64
05:28:18.948803 [DST_TEP_MAC_ADDR] > [SRC_TEP_MAC_ADDR], ethertype 802.1Q (0x8100), length 160: vlan 141, p 0, ethertype IPv4,  <Dest_TEP_IP>.50998 > <Src_TEP_IP>.6081: Geneve, Flags [C], vni 0x10802, proto TEB (0x6558), options [8 bytes]: [MAC_ADDR_2] > [MAC_ADDR_1], ethertype IPv4 (0x0800), length 98: 10.1.##.## > 10.0.##.##: ICMP echo reply, id 32771, seq 1027, length 64

[root@esx-05:~] pktcap-uw --switchport <Port ID> --capture VnicTx,VnicRx -o- | tcpdump-uw -r - -nne
The switch port id is 0x04####10.
The session capture point is VnicTx,VnicRx.
pktcap: The output file is -.
pktcap: No server port specifed, select 28025 as the port.
pktcap: Local CID 2.
pktcap: Listen on port 28025.
pktcap: Main thread: 748447206208.
pktcap: Recv Thread: 748448270080.
pktcap: Accept...
pktcap: Vsock connection from port 1030 cid 2.
pktcap: Dump Thread: 748447741696.
reading from file -, link-type EN10MB (Ethernet)
05:28:17.948464 [MAC_ADDR_1] > [MAC_ADDR_2], ethertype IPv4 (0x0800), length 98: 10.0.##.## > 10.1.##.##: ICMP echo request, id 32771, seq 1026, length 64
05:28:17.948531 [MAC_ADDR_2] > [MAC_ADDR_1], ethertype IPv4 (0x0800), length 98: 10.1.##.## > 10.0.##.##: ICMP echo reply, id 32771, seq 1026, length 64

05:28:18.948685 [MAC_ADDR_1] > [MAC_ADDR_2], ethertype IPv4 (0x0800), length 98: 10.0.##.## > 10.1.##.##: ICMP echo request, id 32771, seq 1027, length 64
05:28:18.948769 [MAC_ADDR_2] > [MAC_ADDR_1], ethertype IPv4 (0x0800), length 98: 10.1.##.## > 10.0.##.##: ICMP echo reply, id 32771, seq 1027, length 64


For North-South Traffic

  • To capture traffic on ESXi end, the same is applied as stated in "EAST WEST TRAFFIC" section.
  • To capture on the Edge Uplink:
    1. Login to the active Edge Node
    2. get into the vrf of SERVICE_ROUTER_TIER0
    3. get logical-router    <<<<< list of logical-router
    4. vrf <vrf_number>     <<<<< entering the respective vrf
    5. get interface      <<<<< list all the interfaces and look for "Port-type: uplink" "Interface" "IP" "MAC" to get the Edge uplink interface details
    6. exit
sample:

<Edge_Name>> get logical-router

Fri Mar 17 2023 UTC 06:21:39.455

Logical Router

UUID                  VRF  LR-ID Name               Type            Ports  Neighbors

736a####-####-####-####-########86666  0   0                    TUNNEL           4    6/5000

7d50####-####-####-####-########990e  4   16   DR-T1-Gateway-01         DISTRIBUTED_ROUTER_TIER1  8    2/50000

b9ca####-####-####-####-########1cfb  5   1   DR-T0-Gateway-01         DISTRIBUTED_ROUTER_TIER0  5    2/50000

c4dc####-####-####-####-########3dee  6   17   SR-T0-Gateway-01         SERVICE_ROUTER_TIER0    6    2/50000 <<<


edge01> vrf 6

edge01(tier0_sr[6])>

edge01(tier0_sr[6])> get interface

edge01(tier0_sr[6])> get interface

Fri Mar 17 2023 UTC 06:24:10.473

Logical Router

UUID                  VRF  LR-ID Name               Type

c4dc####-####-####-####-########3dee  6   17   SR-T0-Gateway-01         SERVICE_ROUTER_TIER0

Interfaces (IPv6 DAD Status A-DAD_Success, F-DAD_Duplicate, T-DAD_Tentative, U-DAD_Unavailable)

  Interface   : f659####-####-####-####-########2a40 <<<<<<

  Ifuid     : 325

  Name     : EdgeUplinkB-TN1

  Fwd-mode   : IPV4_AND_IPV6

  Internal name : uplink-325

  Mode     : lif

  Port-type   : uplink <<<<<<<<

  IP/Mask    : 192.##.##.1/24 <<<<<<<<

  MAC      : ##:##:##:##:##:## <<<<<<<

  VLAN     : 133

  Access-VLAN  : untagged

  LS port    : e87a####-####-####-####-########9cd6

  Urpf-mode   : STRICT_MODE

  DAD-mode   : LOOSE

  RA-mode    : SLAAC_DNS_TRHOUGH_RA(M=0, O=0)

  Admin     : up

  Op_state   : up

  Enable-mcast : False

  MTU      : 8800

  arp_proxy   :

edge01(tier0_sr[6])> exit

To see Live Traffic

  • start capture interface <port-uuid-name> direction dual expression <expression>
  • Ctrl + C to stop capture

If you want to use two conditions after expression

    • start capture interface <port-uuid-name> direction dual expression (condition 1) or (condition 2)
    • Example: start capture interface fp-eth0 direction dual expression (host 172.16.120.11 and 8.8.8.8) or (host 172.16.120.12 and 8.8.8.88)  

 

 
Sample: pinging from 10.0.0.10 to 8.8.8.8

<Edge_Name>> start capture interface f659####-####-####-####-########2a40 direction dual expression host 10.0.0.10

06:30:37.592896 [MAC_ADDR_1] > [DEST_MAC], ethertype IPv4 (0x0800), length 98: 10.0.0.10 > 8.8.8.8: ICMP echo request, id 32779, seq 1030, length 64

<base64>AFBWAQb4AFBWnZzFCABFwAA0AAAAAP8RLqjAqIUBwKiF/tv0DsgAIK3iIMADGCF4DxuO43LQAA9CQAAPQkAAAAAA</base64>

06:30:38.433363 [DEST_MAC] > [MAC_ADDR_1], ethertype IPv4 (0x0800), length 98: 8.8.8.8 > 10.0.0.10: ICMP echo reply, id 32779, seq 1030, length 64

<base64>AFBWAQb4AFBWnZzFCABFwAA0AAAAAP8RLqjAqIUBwKiF/tv0DsgAIK3iIMADGCF4DxuO43LQAA9CQAAPQkAAAAAA</base64>

06:30:38.743828 [MAC_ADDR_1] > [DEST_MAC], ethertype IPv4 (0x0800), length 98: 10.0.0.10 > 8.8.8.8: ICMP echo request, id 32779, seq 1031, length 64

<base64>AFBWAQb4AFBWnZzFCABFwAA0AAAAAP8RLqjAqIUBwKiF/tv0DsgAIK3iIMADGCF4DxuO43LQAA9CQAAPQkAAAAAA</base64>

<base64>AFBWnZzFAFBWAQb4CABFwAA0JBFAAP8RypbAqIX+wKiFAcAcDsgAIAZqIMADGI7jctAheA8bAA9CQAAPQkAAAMNQ</base64>

06:30:39.382513 [DEST_MAC] > [MAC_ADDR_1], ethertype IPv4 (0x0800), length 98: 8.8.8.8 > 10.0.0.10: ICMP echo reply, id 32779, seq 1031, length 64

<base64>AFBWAQb4AFBWnZzFCABFwAA0AAAAAP8RLqjAqIUBwKiF/tv0DsgAIK3iIMADGCF4DxuO43LQAA9CQAAPQkAAAAAA</base64>

^C

5 packets captured

5 packets received by filter

0 packets dropped by kernel

To Capture and Write on pcap

  • Define capture session

set capture session <session-number> interface <port-uuid-name> direction dual

  • View capture session

get capture session

  • Start capture session

set capture session <session-number> file <filename> expression <expression>

  • View capture files

get files

  • Copy capture files

copy file <filename> url scp://username@<Target_location_IP>/filepath/filename

Alternatively, you can just scp using any client (WinSCP) etc to the NSX-T edge node and extract the files from stored directory.

Note: generated pcaps are stored in "/image/vmware/nsx/file-store/" directory on NSX-T edge node.


Sample

<Edge_Name>> set capture session 0 interface f659####-####-####-####-########2a40 direction dual

<Edge_Name>> get capture session

Fri Mar 17 2023 UTC 06:40:03.244

Packet Capture Session

ID          : 0

PORTS        : ['<ports>']

Packet Capture Session

ID          : 1

PORTS        : []

Packet Capture Session

ID          : 2

PORTS        : []

Packet Capture Session

ID          : 3

PORTS        : []

Packet Capture Session

ID          : 4

PORTS        : []

Packet Capture Session

ID          : 5

PORTS        : []

<Edge_Name>> set capture session 0 file Test_Capture.pcap expression host <IP.To.be.Filtered>

Capture to file initiated, enter Ctrl-C to terminate

^C11 packets captured

12 packets received by filter

0 packets dropped by kernel

<Edge_Name>> get files

Fri Mar 17 2023 UTC 06:43:04.413

Directory of filestore:/

    -rw-    5398   Nov 03 2022 23:52:31 UTC nsx_backup_cleaner.py

    -rw-    9967   Nov 03 2022 23:52:31 UTC backup_restore_helper.py

    -rw-     972   Mar 17 2023 06:42:42 UTC Test_Capture.pcap <<<<<<<<<<

    -rw-    5748   Nov 03 2022 23:52:31 UTC get_backup_timestamps.sh

<Edge_Name>> copy file Test_Capture.pcap url scp://root@<Target_location_IP>/tmp/

Are you sure you want to continue connecting (yes/no)? yes

Password:

Test_Capture.pcap               100% 972   1.2MB/s  00:00