Troubleshooting NSX Network Connectivity Issues
search cancel

Troubleshooting NSX Network Connectivity Issues

book

Article ID: 381235

calendar_today

Updated On: 03-04-2025

Products

VMware NSX

Issue/Introduction

This article examines areas to check when troubleshooting network connectivity issues.

Environment

VMware NSX 

Cause

Network connectivity issue can occur for many different reasons. This article looks at some of the areas to review and troubleshoot.  

Resolution

  • Determine the scope of the issue.
    • Is this affecting all VMs in the environment?
    • Is all traffic being affected or just specific type of traffic? 
    • Does this affect communication between different networks only or does it also happen between machines on the same network? 
    • Does the issue happen with East-West (E-W) traffic or does it affect North-South (N-S) traffic?
  • Run a quick capture at the source VM and confirm if the source VM is actually sending the traffic. If it is not:
    • Confirm the VM actually has its interface in "connected" state.
    • Check if the ports of the VM are not in a blocked state on the DVS.
      • net-dvs - l | less 
    • Confirm that the traffic is not getting dropped due to DFW
      • You can test placing the VM in exclusion list 
      • perform a TraceFlow from NSX UI -> Plan & Troubleshoot and this will tell you if DFW is dropping it and which rules are involved.
      • Check if the traffic is not getting dropped by Network Introspection.
      • Check the health of the ESXi hosting the VM from host prep tab.
      • Issues with MP or CP connectivity between host and Manager can cause NSX port attachment issues.
      • You can check the output of stats for specific vNIC: 
        • vsish Cat /net/portsets/DvsPortset-x/ports/xxx/outputStats
          • net-stats -l  --> check the dvsportset ID.
      • run the following command to check interface vnic stats of all vms in a host:
        • net-stats -l | grep -ivE "vmnic|vmk|PortNum" | awk '{print $6"," $1 ","$4}' | while read set; do portID=`echo $set|cut -d\, -f2`; portset=`echo $set|cut -d\, -f3`; vmname=`echo $set|cut -d\, -f1`; echo $vmname; vsish -e cat /net/portsets/$portset/ports/$portID/vmxnet3/rxSummary; done | less
            
  • If the issue is with E-W traffic :
    • Check TEP to TEP connectivity and MTU.  vmkping ++netstack=vxlan -I vmkX (X = vmk number of source TEP)  <remote TEP IP> -d -s <packet size>
    • Confirm if the issue happens with VMs on different segments or if it happens also with VMs on the same segment.
    • Check if the issue gets fixed by moving the VMs to the same ESXi host (discards physical issue). 
    •  If issue still happens if VMs on the same host and same network, refer to checks on step 2.
    • If they don't, capture at the uplink level to confirm the traffic is actually leaving the source host. 

 

  • If the issue is with N-S traffic :
    • Confirm connectivity and MTU config between host TEPs and Edge TEPs.
      • If host TEPs and Edge TEPs are on different networks, it is important to remember the interfaces of the L3 device routing between these 2 networks also has to have proper MTU. 
    • Run a TraceFlow from NSX Manager UI --> Plan & Troubleshoot --> TraceFlow using the source VM and IP of destination machine (even if it is outside NSX).
      • Even though TraceFlow doesn't follow the packet once it exists the NSX environment, it will tell you which Edge and which uplink on said Edge is handling this traffic or show dropped before it gets there
      • If the traffic shows "delivered" to the T0 Edge Uplink, but connectivity issues still persists, then perform a packet capture on Edge uplink.
      • Also, run a TraceFlow within the Edge CLI using the base64 info of the packets shown in the Edge CLI capture.
      • Perform packet captures at ESXi host level of the host that has the Edge VM. Confirm packets are leaving the uplink interface towards physical networking (capture at vnic and vmnic level). 
  • If it is not any of the above, and traffic is actually leaving the host uplink towards physical, the issue might be with underlay networking.

Additional Information

If you are contacting Broadcom support about this issue, please provide the following:

  • Verify if there are issues on the physical network.
  • Verify if there were any recent changes on ESXI.
  • Verify if DRS may have moved around VMs prior to the issue.
  • State of the Edges and Host on the NSX UI and in vCenter.

Handling Log Bundles for offline review with Broadcom support

Known issues