Severe Performance Degradation for Physical-to-Physical Traffic over NSX L2 Bridge (Veeam NBD Transport).
search cancel

Severe Performance Degradation for Physical-to-Physical Traffic over NSX L2 Bridge (Veeam NBD Transport).

book

Article ID: 437388

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

  • Backup operations (e.g., using Veeam in NBD mode) between physical ESXi hosts and a physical backup server exhibit significantly lower throughput than expected.
  • The physical backup server resides on a bridged NSX Overlay Segment.
  • Traceroute confirms traffic traverses the NSX overlay datapath and the virtual default gateway.

Environment

VMware NSX

Cause

The performance bottleneck is caused by network hairpinning and software packet processing constraints within the NSX Edge L2 bridge.

  1. Encapsulation Overhead: Pushing massive "elephant flows" through the software-defined datapath forces traffic to the Edge node for Geneve encapsulation.
  2. CPU Saturation: Traffic is processed by the Edge CPU via DPDK. Because the backup often consists of a single traffic flow (single five-tuple), it saturates a single CPU core, reaching its processing limit.

Resolution

The architecturally optimal solution is to re-IP and relocate the physical backup server to a native physical VLAN.

  • This completely bypasses the NSX routing datapath and Geneve encapsulation.
  • High-throughput traffic will flow directly over hardware switches, eliminating software-defined processing overhead.

Additional Information

If immediate relocation is not possible due to policy restrictions:

  1. Multi-Stream Configuration: Configure the backup application (e.g., Veeam) to use multiple streams/threads. This generates multiple five-tuples, allowing NSX to distribute the load across multiple Edge CPU cores. High Single-Core CPU Utilization During Large Data Transfers Due to Single Traffic Flow.
  2. Throughput Validation: Run an iperf test between the ESXi host management vmkernel interface and the physical server (using port 5201) to establish a data-driven baseline for the maximum achievable throughput across the current bridge.