VMs on Bridge Segment Cannot Reach Overlay Networks After Edge Node Replacement
search cancel

VMs on Bridge Segment Cannot Reach Overlay Networks After Edge Node Replacement

book

Article ID: 412607

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

Following NSX edge node replacement, virtual machines on a bridged overlay segment lose connectivity to physical infrastructure. VMs on the affected bridge segment can communicate with each other on the same segment (both same host and across hosts), but cannot reach:

  • Physical servers on the same VLAN
  • NSX overlay networks outside the bridge segment
  • External networks requiring bridge traversal

Symptoms:

  • Intra-segment VM-to-VM communication works normally
  • Bridge segment VMs cannot reach physical servers on the bridged VLAN
  • Packet captures show traffic reaching the edge TEP but failing to traverse between bridge and overlay/physical networks
  • Other segments (non-bridged) maintain normal connectivity to physical infrastructure
  • Issue began immediately after edge node replacement activity

Business Impact: Loss of connectivity for workloads that depend on bridge segments for communication with physical infrastructure or other NSX overlay segments, particularly common for backup servers, management infrastructure, or workloads undergoing phased migration to NSX.

Bridge Architecture Overview: NSX Edge Bridging extends Layer 2 connectivity between NSX overlay segments (Geneve encapsulation) and VLAN-backed networks. The Edge Node acts as a Layer 2 gateway, with the following interfaces:

  • fp-eth0: Tunnel Endpoint (TEP) for overlay traffic
  • fp-eth1: Primary VLAN interface for bridging
  • fp-eth2: Optional second VLAN interface (dual VDS architectures)

Environment

Affected Versions:

  • VMware NSX 4.x +
  • VMware vSphere 8.x +
  • Occurs after edge node replacement, edge cluster changes, or edge configuration modifications

Topology Requirements:

  • NSX Edge Bridge configured to extend overlay segment to VLAN
  • Edge nodes deployed in Edge Cluster
  • Bridge profile mapping overlay segment to VLAN transport zone
  • vSphere VDS port groups configured for edge node connectivity

Common Scenarios:

  • Edge node hardware replacement
  • Edge VM redeployment or migration
  • Edge cluster reconfiguration
  • Configuration restore after failure
  • NSX version upgrades with edge replacement

Cause

Primary Cause: Missing or incomplete bridge configuration on the replacement edge nodes. The bridge configuration consists of five interdependent components that must be properly configured together:

  1. Edge Bridge Profile - Defines which edge nodes participate in bridging (Primary/Backup)
  2. Segment Bridge Configuration - Maps the overlay segment to a VLAN ID in a VLAN transport zone
  3. Edge Node Transport Configuration - Edge interfaces must be properly attached to transport zones
  4. VDS Security Settings - Forged Transmits must be set to Accept on VLAN side port groups (should persist unless using different VDS)
  5. Frame Delivery Method - Either Promiscuous Mode or MAC Learning must be configured (should persist unless using different VDS)

Why These Components Are Required:

Physical switches and vSphere VDS operate differently regarding MAC address handling:

  • Physical switches learn MAC addresses by observing frames and build dynamic MAC tables
  • vSphere VDS is programmed with VM MAC addresses when VMs attach, but does NOT dynamically learn like physical switches

When a physical server sends traffic to an overlay VM, the physical switch forwards to the ESXi host. However, the VDS doesn't recognize the overlay VM's MAC address (not directly attached), so it drops the frame. This is why frame delivery methods (Promiscuous Mode or MAC Learning) are required.

Additionally, the edge node sends frames with overlay VM source MAC addresses (not the edge vNIC MAC). The VDS security setting "Forged Transmits" must be set to Accept, or the VDS will drop these frames when it detects the source MAC doesn't match the edge vNIC.

Why Configuration Is Lost During Replacement:

Edge node replacement often results in missing bridge components:

  • Edge Bridge Profile references old/removed edge nodes
  • Bridge-to-overlay segment mappings not recreated
  • Transport zone assignments on new edge nodes incomplete
  • VDS configuration may differ (single vs dual VDS architecture)
  • If using different VDS or port groups: Security settings and frame delivery method not configured

Traffic Flow Breakdown:

Packet captures confirm traffic successfully reaches the edge TEP from the source ESXi TEP, indicating overlay networking functions properly. The failure occurs at the bridge point where traffic should traverse between:

  • Overlay segment → VLAN (outbound direction)
  • VLAN → Overlay segment (inbound direction)

This indicates the edge node receives traffic but cannot bridge it due to missing configuration.

Resolution

Primary Resolution: Restore Bridge Configuration Within NSX

This is the recommended approach and should be attempted first. Only move the bridge outside NSX if configuration cannot be restored due to missing documentation or inability to recreate proper configuration.

Step 1: Verify Edge Node Configuration

  1. Navigate to System → Fabric → Nodes → Edge Transport Nodes
  2. Select each edge node in the cluster
  3. Verify:
    • Edge node is in "Success" state
    • VDS configuration matches intended architecture
    • Transport zones are properly attached:
      • Overlay transport zone (for TEP traffic on fp-eth0)
      • VLAN transport zone (for bridge VLAN side on fp-eth1)
    • Edge interfaces are properly mapped to transport zones

Understanding Single vs Dual VDS Architecture:

Single VDS Configuration:

  • All traffic (overlay + VLAN) uses one VDS
  • Simpler configuration
  • Limitation: Cannot bridge to same VLAN as Tier-0 uplinks (VLAN ID conflict)

Dual VDS Configuration:

  • VDS 1: Overlay TZ + VLAN TZ for Tier-0 uplinks (fp-eth0, fp-eth1)
  • VDS 2: Separate VLAN TZ for bridging only (fp-eth2)
  • Eliminates VLAN ID conflicts
  • Provides traffic isolation

Step 2: Verify or Recreate Edge Bridge Profile

  1. Navigate to Networking → Segments → Edge Bridge Profiles
  2. Check if bridge profile exists and references correct edge nodes
  3. If profile is missing or references old edge nodes, create new profile:
    • Click Add Edge Bridge Profile
    • Name: <descriptive-name>-bridge-profile
    • Edge Cluster: Select cluster containing replacement edge nodes
    • Primary Edge Node: Select from cluster membership
    • Secondary Edge Node: Select backup edge node
    • Failover Mode:
      • Preemptive: Primary resumes active role after recovery (causes brief disruption)
      • Non-Preemptive: Backup continues active after primary recovery (no disruption)
    • Click Save

Step 3: Attach Edge Bridge to Segment

  1. Navigate to Networking → Segments
  2. Select the affected overlay segment
  3. Click Edit → Scroll to Edge Bridges section
  4. Click SetAdd Edge Bridge
  5. Configure:
    • Edge Bridge Profile: Select profile from Step 2
    • Transport Zone: Select VLAN transport zone (CRITICAL: Do not select overlay TZ)
    • VLAN ID: Enter VLAN ID to bridge to
    • Teaming Policy: (Optional) For multi-uplink deterministic control
  6. Click AddApplySave

Step 4: Verify Bridge Realization

Using SSH to edge node, run:

get logical-switches

Expected output:

  • VLAN logical switches section shows the bridged VLAN ID with device assignment (fp-eth1 or fp-eth2)
  • Overlay logical switches section shows the overlay segment VNI
  • Confirms bridge successfully created and realized on edge node

Step 5: Verify Connectivity

From a VM on the bridge segment, test:

  1. Connectivity to physical servers on the VLAN
  2. Connectivity to VMs on other overlay segments
  3. External routing (if applicable)

Step 6: Verify VDS Security Settings (Only if Using Different VDS or Still Having Issues)

Note: If the replacement edge node connects to the same VDS port groups as the original edge, these settings should already be in place and do not need reconfiguration. Only verify/configure these if:

  • Using a different VDS than the original configuration
  • Using different port groups than the original configuration
  • Connectivity still fails after completing Steps 1-5

On the vSphere VDS port group used by edge node fp-eth1 (VLAN side):

  1. Navigate to vCenter → VDS → Port Group → Edit Settings → Security
  2. Verify/Configure:
    • Promiscuous Mode: Accept (if using Promiscuous Mode method) OR MAC Learning enabled
    • MAC Address Changes: Reject (keep default - no change needed)
    • Forged Transmits: Accept (CRITICAL - must be Accept)

Why Forged Transmits Must Be Accept:

The edge node sends frames with overlay VM source MAC addresses (not the edge vNIC MAC). When "Forged Transmits" is set to Reject (default), the VDS compares the source MAC in the frame to the originating vNIC MAC and drops mismatches. Setting to Accept allows the edge to forward frames with different source MACs, which is required for bridging to function.

Frame Delivery Method (Should Already Be Configured):

If bridge still not functioning and VDS security settings are correct, verify frame delivery method is configured:

Method 1: Promiscuous Mode (Simplest)

  1. Navigate to vCenter → VDS → Port Group (used by edge fp-eth1) → Edit Settings → Security
  2. Verify Promiscuous Mode is set to Accept

Method 2: MAC Learning (Recommended)

  1. Navigate to vCenter → VDS → Port Group → Edit Settings → Security
  2. Verify MAC address learning is enabled on the port group

Fallback Resolution: Move Bridge Outside NSX (Last Resort Only)

Important: Only use this option if bridge configuration cannot be restored within NSX due to missing documentation or inability to recreate proper configuration.

  1. Remove the Edge Bridge configuration from the NSX segment
  2. Configure external routing to the segment via physical network infrastructure
  3. This restores connectivity but loses NSX bridge benefits (centralized management, visibility, policy controls)

Change Management Best Practices

To prevent this issue during future edge replacements:

Pre-Replacement Documentation:

  1. Screenshot all Edge Bridge Profile configurations
  2. Document segment-to-VLAN mappings for all bridged segments
  3. Export edge node configuration
  4. Document VDS architecture (single vs dual VDS)
  5. Document which VDS port groups edge uses
  6. Screenshot VDS port group security settings (if planning to use different VDS)
  7. Record frame delivery method in use
  8. Document transport zone assignments

Replacement Procedure:

  1. Deploy new edge node(s) with identical VDS configuration
  2. Connect to same VDS port groups as original edge (preserves security settings)
  3. Verify transport zone assignments match
  4. Update Edge Bridge Profiles to reference new edge nodes
  5. Verify bridge realization with get logical-switches CLI command
  6. Test connectivity before removing old edge nodes

Post-Replacement Validation:

  1. Verify all bridge segments in "Success" state
  2. Test connectivity from bridge segments to physical infrastructure
  3. Check for alarm conditions
  4. Review NSX Manager logs for bridge-related errors

Additional Information