Traffic Loss Following Physical Network Changes
search cancel

Traffic Loss Following Physical Network Changes

book

Article ID: 438554

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

Network communication failures occur for virtual machines (VMs) located behind NSX Edges after modifications to the physical network environment. While NSX correctly processes and forwards traffic, return packets are dropped or lost within the physical infrastructure. This issue commonly follows core switch migrations or updates to BGP peering configurations on physical routers.

  • Virtual machines on overlay segments lose connectivity to external networks.
  • Pings to BGP peers configured on the NSX Edge fail from specific physical subnets.
  • Traceflow in the NSX GUI shows that packets can leave the virtual environment, and process inbound packets if they reach the Edge uplinks.
  • Northbound packets are confirmed leaving the Edge host via physical uplinks (e.g., vmnic5) on the designated VLAN.
  • Southbound packets arrive at the virtual components (such as the Tier0 BGP Peer), and traffic is returned. This returning traffic exits the Edge and ESX but does to arrive at the physical origination point.

Environment

VMware NSX

Cause

The issue is caused by components external to the ESXi hosts and NSX environment. Physical network changes, such as switch migrations, can result in asymmetric routing or incorrect VLAN tagging, causing return traffic to be dropped after it exits the VMware components.

Resolution

If NSX Traceflow and packet captures confirm that traffic is leaving the virtual environment, the investigation must focus on the physical data path.

  1. Perform Traceflow: Use the NSX Manager UI to run a Traceflow from the affected VM to the external destination. Confirm the packet reaches the "Physical" observation point.
  2. Verify Edge Uplinks: Perform a packet capture on the Edge transport node uplink to verify that ICMP requests are being sent out and to check if any return traffic is arriving.
    Example CLI to capture BGP/ICMP traffic on Edge

    # start capture interface <interface-id> direction dual expression host [IP_ADDRESS]


  3. Check Physical MAC Table: Verify that the physical switch ports connected to the ESXi hosts are learning the MAC addresses of the NSX Edge uplinks on the correct VLAN.
  4. Validate VLAN Tagging: Ensure the VLAN used for NSX Edge peering (e.g., VLAN 572) is consistently tagged across all trunk ports in the new physical switch fabric.
  5. BGP Route Verification: Check the BGP routing table on the physical core switches to ensure the NSX overlay prefixes are being learned and have the correct next-hop reachability.

Additional Information

NSX network outage due to BGP route withdrawal from upstream BGP neighbors