VMware NSX
NSX preserves the Layer 2 broadcast behavior of traditional Ethernet switching within its virtual networks. Depending on the destination MAC address, NSX handles traffic in two distinct ways:
Known Unicast: If the destination MAC address is known, NSX forwards the frame directly to the appropriate endpoint via unicast.
BUM Traffic: If the frame is broadcast, unknown unicast, or multicast (BUM), NSX floods the frame across the virtual network.
Flooding in NSX means replicating the frame to all transport nodes (e.g., ESXi hosts, edge nodes) that have ports connected to the segment. Within each node, the frame is further delivered to all relevant local virtual ports.
While NSX supports multicast optimization and filtering in some scenarios, this article focuses specifically on BUM replication across transport nodes without multicast routing.
Replication Between Transport Nodes
NSX uses tunnels between Tunnel End Points (TEPs) to forward traffic between transport nodes. Each node typically has multiple TEPs for performance and redundancy. Flooding a frame results in NSX replicating it in software to every remote TEP in the segment. For example, flooding a frame to three remote nodes, each with two TEPs, results in six replicas—impacting both physical NIC bandwidth and host CPU.
Flow Cache Limitation
NSX relies on a flow cache for efficient forwarding of known unicast traffic. However, this cache is not used for BUM traffic. Even when BUM frames are sent to a single destination, they do not benefit from the performance optimizations of cached unicast flows.
Optimizing for BUM Traffic
To mitigate the performance overhead of BUM replication, NSX is already performing ARP suppression, which cache ARP responses and reduce the most common broadcast in IP network. There are few other options based on design:
Minimize Broadcast Domains: Keep virtual network segments as small as possible.
Optimize Workload Placement: Group workloads on the same hosts where possible to avoid cross-node replication.
Limit TEP Count: Fewer TEPs per host reduce the number of replications required.