This article provides a structured approach to collect essential data, logs, and packet captures when experiencing datapath issues such as ping loss, intermittent connectivity, latency, or network disconnects affecting virtual machines in an NSX-T environment.
Purpose:
To effectively diagnose and isolate network connectivity problems, particularly those affecting a subset of virtual machines.
These captures are Essential for Broadcom to perform diagnosis on the Datapath connectivity issue.
VMware NSX-T Data Center
VMware NSX
Stage 1: Initial Network Trace (NSX UI)
Perform a Traceflow from the NSX Manager UI. This initial step helps visualize the logical path and identify potential drops or misconfigurations within the NSX overlay.
Stage 2: Logical Connectivity Diagnostic Tests
Perform the following tests from the impacted virtual machines to help isolate the problem area.
Ping Tests (from Affected VM):
Gateway and Internet Reachability (from Affected VM):
Traceroute & Host-Level Data (from Affected VM & ESXi Host):
#esxcli network ip neighbor listStage 3: In-Depth Host Diagnostics
Before migrating all VMs off the host, perform host-level diagnostics if the issue is suspected to be host-specific.
Prepare the Host:
VM Switch Port Mapping:
#net-stats -lLive Kernel Dump:
#localcli --plugin-dir /usr/lib/vmware/esxcli/int debug livedump performEsxtop Data Collection:
#/usr/sbin/esxtop -b -d 2 -n 60 > /vmfs/volumes/<Volume_ID>/$(hostname)_$(date +"%Y_%m_%d_%I_%M_%p").csv<Volume_ID> with your actual datastore ID.Stage 4: VM and Network Analysis
If most VMs have been migrated, continue analysis with a few non-critical VMs remaining on the host.
Console-Level Tests (from Affected VM's Console via vSphere UI):
#net-dvs -l | grep -E "port |port.block|volatile.vlan|volatile.status"Identify Switchport and Uplink Info (from ESXi host SSH):
#vswitch instance listesxcli (Alternate method):#esxcli network vm list
Get World ID esxcli network vm port list -w <world_ID>
Use the World ID from aboveStage 5: Packet Capture Workflow
Packet captures are crucial for deep-dive network analysis. Ensure timestamp correlation for all captures.
Pre-Capture Setup:
Guest VM Packet Captures:
#tcpdump -i <eth_interface_name> -nn host <target_IP> -w /<filesystem>/Source_Guest_OS.pcapping <target_VM_IP> & # Run in background
#tcpdump -i <eth_interface_name> -nn -w /<filesystem>/Guest_OS.pcapESXi Host Packet Captures:
#pktcap-uw --switchport <SwitchportID> --capture VnicTx,VnicRx -o /vmfs/volumes/<datastore>/<VM>_VnicTxRx.pcap#pktcap-uw --switchport <SwitchportID> --ng -o /vmfs/volumes/<datastore>/<VM>-VswitchTxExit.pcap pktcap-uw --switchport <SwitchportID> --dir 1 --ng -o /vmfs/volumes/<datastore>/<VM>-VswitchRx.pcap#pktcap-uw --switchport <SwitchportID> --srcip <src_IP> --capture VnicRx --ng -o /vmfs/volumes/<datastore>/<SRC_IP>_<DST_IP>-VnicRxEntry.pcap#pktcap-uw --uplink <vmnic#> --capture PortOutput,PortInput -o /vmfs/volumes/<datastore>/<VM>_<vmnic#>_PortIO.pcap#pktcap-uw --uplink <vmnic#> --capture UplinkRcvKernel,UplinkSndKernel -o /vmfs/volumes/<datastore>/<VM>_<vmnic#>_VmnicRxTx.pcap#pktcap-uw --uplink <vmnic#> --dir 0 --stage 1 -o /vmfs/volumes/<datastore>/<VM>_<vmnic#>_EtherswitchDispatch.pcap