Dynamic routing protocol selection for multi-site NSX deployments with single or multiple ISP connections
search cancel

Dynamic routing protocol selection for multi-site NSX deployments with single or multiple ISP connections

book

Article ID: 412422

calendar_today

Updated On:

Products

VMware NSX

Issue/Introduction

Multi-site NSX deployments typically require BGP for external connectivity due to the complexity and scale of enterprise networks using VMware NSX. BGP is the standard routing protocol for NSX implementations, providing the inter-domain routing intelligence and flexibility needed for enterprise network architectures.

Even in deployments where multiple ISPs are not immediately required, implementing BGP from the start prepares the infrastructure for future growth and changing connectivity needs. Configuring both eBGP and iBGP establishes the foundation for multi-ISP architecture, ensuring the network can scale without requiring complete routing redesign.

A small minority of NSX deployments use OSPF with static routing for external connectivity. This occurs in limited scenarios where all sites connect through a single ISP with unified routing, and typically represents infrastructures that have not yet encountered the need for multi-ISP connectivity.

This article provides architectural guidance for routing protocol selection in multi-site NSX environments, with primary focus on BGP implementations that serve the majority of deployments.

Typical NSX Deployment Patterns

BGP Architecture (Standard Deployment):

BGP is standard for NSX deployments because it provides:

  • Inter-domain routing intelligence for diverse upstream connectivity
  • Preparation for future multi-ISP requirements even if not immediately needed
  • Ability to adapt automatically to internet routing changes
  • Support for complex multi-cloud and hybrid cloud architectures
  • Granular control over path selection to diverse internet destinations
  • Integration with existing BGP infrastructure in enterprise networks

NSX customers typically operate sophisticated network environments requiring:

  • Autonomous routing decisions independent of individual ISP routing policies
  • Ability to scale across multiple geographic regions with diverse ISP options
  • Reliable connectivity to content delivery networks and cloud services
  • Traffic engineering capabilities for inbound and outbound path control
  • Infrastructure prepared for growth without architectural redesign

OSPF with Static Routing (Small Minority of Deployments):

A small minority of NSX deployments use OSPF for inter-site routing with static routes for external connectivity. This occurs only when:

  • All sites connect to the same ISP with guaranteed consistent routing
  • The deployment is in an early phase before expanding
  • The organization operates in a constrained environment with limited upstream options

This architecture exists in infrastructures that have not yet encountered the need for multi-ISP connectivity, but most NSX implementations benefit from BGP from initial deployment.

Why BGP Is Standard

Preparation for Growth:

Implementing BGP with both eBGP and iBGP from initial deployment prepares the network for future requirements. Even if multiple ISPs are not immediately needed, the architecture supports adding additional upstream providers without redesign. As organizations expand to new sites or regions, ISP options vary based on availability, cost, and performance. Starting with BGP avoids costly rearchitecture later.

Content Delivery and Cloud Integration:

Modern enterprise applications rely heavily on content delivery networks, SaaS platforms, and public cloud services. These providers announce their prefixes through BGP with specific peering requirements and path preferences. Optimal access requires BGP intelligence to understand how different paths reach these destinations.

NSX Customer Profile:

Organizations deploying NSX typically have:

  • Dedicated network engineering teams with routing expertise
  • Existing BGP infrastructures in their physical network
  • Requirements for granular routing control
  • Multi-cloud strategies requiring intelligent path selection
  • Global operations spanning multiple regions

The deployment of NSX itself indicates network complexity that benefits from BGP capabilities.

Environment

  • VMware NSX multi-site architecture
  • NSX Edge Cluster configured in Active-Active or Active-Standby mode
  • Multi-site deployment with physical network infrastructure
  • Integration with physical ToR switches or edge routers

Cause

The architectural requirements for NSX deployments typically necessitate BGP due to the scale, complexity, and growth trajectory of enterprise networks.

Why BGP Is Standard for NSX Deployments

Future-Proofing Architecture: Even when multiple ISPs are not immediately required, BGP provides the foundation for scaling without rearchitecture. Configuring eBGP and iBGP establishes the routing framework that accommodates growth. As organizations expand geographically or add sites, different ISP options become available or necessary in each region. BGP architecture handles this evolution seamlessly.

Inter-Domain Routing Requirements: BGP is specifically designed as an Exterior Gateway Protocol for routing between autonomous systems. It provides destination-specific routing intelligence that adapts to diverse upstream connectivity. Each ISP has distinct peering arrangements, transit relationships, and AS paths to reach internet destinations. BGP understands and leverages these differences.

Content Delivery Network Integration: Content delivery networks including major providers announce their prefixes with specific BGP policies. Different upstream providers have negotiated different peering agreements with these content networks, resulting in varying path quality and performance. BGP provides the intelligence to select optimal paths based on destination requirements.

Multi-Site Architectural Considerations: In multi-site deployments with separated external routers at each datacenter, NSX Tier-0 Gateway HA mode selection depends on routing architecture. Each NSX Tier-0 Gateway has a single AS number. When the same AS number is announced from multiple sites to different upstream routers, this can create asymmetric routing if stateful traffic exits through one path and returns through another.

For deployments requiring Active-Standby mode across datacenters, BGP features including AS path prepending enable control over which site is preferred for specific routes. The Failover Domain feature allows designation of primary and secondary datacenters, ensuring that if the active Edge in the primary datacenter fails, another Edge in the same datacenter takes over before failing to the secondary site.

OSPF Limitations: OSPF is an Interior Gateway Protocol designed for routing within a single administrative domain. While effective for campus networks and data center fabrics, OSPF cannot:

  • Make routing decisions based on destination-specific upstream peering arrangements
  • Adapt to internet-scale routing changes across autonomous systems
  • Provide traffic engineering control over external path selection through features like AS path prepending
  • Prepare infrastructure for future multi-ISP requirements
  • Scale to very large networks - organizations like Microsoft and Facebook use BGP internally because BGP handles very large network scale that OSPF cannot support

OSPF and BGP often work together in network architectures. OSPF typically handles internal routing within datacenters between switches and routers, while BGP handles external connectivity when traffic needs to travel to other organizations, MPLS networks, or the internet. This complementary use allows each protocol to operate in its optimal domain.

However, for external connectivity and multi-site architectures with diverse upstream providers, BGP is required. OSPF's convergence speed advantage within a datacenter does not offset its fundamental inability to handle inter-domain routing requirements.

Static routes with fixed next-hop information lack any awareness of destination reachability characteristics or upstream path diversity.

Why Single ISP with OSPF Is Limited

Growth Constraints: OSPF with static routing cannot accommodate future multi-ISP requirements without complete routing rearchitecture. Migration from OSPF/static to BGP requires significant planning, implementation effort, and potential service disruption.

Upstream Dependency: Complete reliance on single provider's routing decisions with no autonomy over external path selection. Cannot influence or optimize paths to specific destinations based on performance or cost requirements.

Geographic Expansion: As organizations add sites in different regions, ISP availability and cost-effectiveness varies by location. Natural expansion drives diverse upstream connectivity requirements that OSPF cannot support.

Resolution

Select the routing architecture based on deployment requirements, with BGP being the standard for NSX implementations.


Option 1: BGP Architecture (Standard for NSX)

Architecture Overview

Implement BGP at physical edge routers for upstream connectivity. Use iBGP between sites to share routing information. Integrate NSX Tier-0 Gateways with BGP for intelligent external path selection.

This represents the standard architecture for NSX deployments and should be the default choice for new implementations.

When This Architecture Is Appropriate

  • Standard choice for NSX multi-site deployments
  • Preparation for future multi-ISP requirements even if not immediately needed
  • Requirements for reliable access to internet services from all sites
  • Content delivery network optimization needed
  • Traffic engineering capabilities required for cost or performance optimization
  • The organization operates or plans to operate in multiple geographic regions
  • Integration with cloud providers requires intelligent path selection
  • Future scalability to additional sites with diverse upstream options is anticipated
  • Multi-site deployments with separated external routers per datacenter requiring traffic engineering control

Multi-Site Tier-0 Gateway Considerations

Active-Active vs Active-Standby Mode:

When deploying Tier-0 Gateways across multiple sites with separate external routers at each datacenter, the HA mode selection impacts routing architecture.

In Active-Active mode, all Edge nodes in the cluster actively forward traffic. However, each NSX Tier-0 Gateway has only one AS number. If the same AS number is announced from multiple sites to different ToR/Leaf switches, these switches receive the same AS number from different points in the network simultaneously. This can create asymmetric routing where stateful traffic exits through one route and returns through another, causing connection failures.

One solution is hosting all Edges for a Tier-0 Gateway on a single site connected to the same ToR switches, but this reduces datacenter redundancy.

In Active-Standby mode across sites, only one Edge has the active Tier-0 Service Router at a time. This prevents asymmetric routing issues in multi-site deployments with separated external routers. BGP features enable control over which site is active for specific Tier-0 Gateways.

Failover Domain Configuration:

For Edge VM deployments in Active-Standby mode, the Failover Domain feature allows designation of primary and secondary datacenters. Multiple Edge VMs can exist in the same Edge cluster at each datacenter. When the active Edge VM in the primary datacenter fails, another Edge in the same datacenter takes over before failing to the secondary site.

Failover Domain configuration requires API calls to:

  • Create failure domains for each site with preferred_active_edge_services set to true for primary site and false for secondary site
  • Associate each Edge node with its respective failure domain using transport node API
  • Configure the Edge cluster to allocate nodes based on failure domain using allocation rules

This ensures proper failover behavior where failure within a site promotes another Edge in the same site before failing over to the remote site.

AS Path Prepending for Traffic Engineering:

AS path prepending controls routing preferences in multi-site deployments. When the same prefix is announced from both datacenters to Border Leaf switches, AS path prepending makes one path less preferred by artificially lengthening the AS path.

The configuration workflow includes:

  • Create IP prefix lists defining which prefixes require traffic engineering treatment
  • Create route maps that apply AS path prepending values to specific prefixes
  • Associate route maps with BGP neighbors on the Tier-0 Gateway
  • Configure prepending values where higher prepend counts make paths less preferred

By default, BGP selects the path with the shortest AS path. Prepending additional AS numbers to a route makes that path less desirable, directing traffic to the preferred datacenter.

This enables active-standby behavior across sites while maintaining BGP routing control. Different Tier-0 Gateways can have different primary sites, distributing uplink bandwidth across both datacenters effectively.

Implementation Guidance

Physical Edge BGP Configuration:

Deploy Top-of-Rack switches or dedicated edge routers capable of handling BGP routing. For full internet routing tables, ensure routers have adequate memory and processing capacity. For partial tables or default route acceptance, resource requirements are lower.

Configure eBGP peering sessions with upstream providers. Establish BGP sessions using appropriate neighbor relationships, AS numbers, and authentication. Configure the routers to receive routing information based on organizational requirements - either full internet routing tables for maximum visibility or filtered prefixes for specific needs.

In multi-site deployments, Border Leaf switches at each datacenter serve as the central point for BGP routing tables. These switches receive routes from NSX Tier-0 Gateways, datacenter fabric switches, and external routers. Using paired Border Leaf switches with the same AS number at each site simplifies design and troubleshooting.

Route Acceptance Strategy:

Organizations choose between full routing tables or filtered prefix acceptance:

Full Internet Routing Tables:

  • Receive all prefixes from each upstream provider
  • Maximum visibility into internet routing topology
  • Optimal path selection for all destinations
  • Higher resource requirements

Partial Tables or Default Plus Specifics:

  • Accept default route plus specific prefixes for critical destinations
  • Lower resource requirements
  • Adequate routing intelligence for many scenarios
  • Configure prefix filtering to accept only required routes

Inter-Site iBGP Configuration:

Establish iBGP sessions between routing infrastructure at each site. Configure BGP peering using internal AS numbers and appropriate session parameters. Ensure next-hop reachability between sites through the inter-site connectivity links.

For deployments with multiple BGP-speaking routers, implement route reflectors to reduce iBGP mesh complexity. Route reflectors simplify configuration and improve scalability in larger topologies.

NSX Tier-0 BGP Integration:

Choose integration strategy based on infrastructure requirements:

Strategy A - BGP on NSX Tier-0 Gateway:

Configure BGP on NSX Tier-0 Gateway to peer with physical edge routers. NSX Edges participate directly in BGP and make routing decisions at the virtual infrastructure layer.

Enable BGP on Tier-0 Gateway through NSX Manager. Configure local AS number, BGP neighbors pointing to ToR router addresses, and route redistribution between BGP and connected routes. Set appropriate route filters and policies based on requirements.

For multi-site deployments, configure:

  • IP prefix lists defining which NSX segments or prefixes require specific BGP treatment
  • Route maps applying BGP attributes including AS path prepending values
  • Association of route maps to BGP neighbors for traffic engineering implementation
  • Appropriate prepending values to control traffic flow between sites

Organizations must determine whether full BGP routing tables on NSX Edges are needed based on infrastructure requirements, control granularity needs, and available resources.

Considerations for full BGP tables on NSX Edges:

  • Provides maximum control over path selection at virtual infrastructure layer
  • Enables direct visibility into BGP path attributes
  • Allows granular traffic engineering per NSX segment
  • Requires appropriate Edge VM sizing based on routing table size
  • Bare Metal Edge deployment may be appropriate based on performance requirements

Strategy B - Default Routes to Physical Edge:

Configure NSX Tier-0 Gateway to receive default routes from physical edge routers. Physical infrastructure handles BGP intelligence and path selection while NSX Edges forward internet-bound traffic without maintaining full routing tables.

Configure route redistribution on physical routers to provide reachability information to NSX environment. NSX Edges use default routing to reach physical infrastructure.

Considerations for this approach:

  • Reduced resource requirements on NSX Edges
  • Centralized BGP management at physical layer
  • Simpler NSX configuration
  • Less granular control at virtual infrastructure layer
  • Traffic engineering handled entirely at physical infrastructure

Route Filtering and Security:

When running BGP with multiple upstream providers (multihoming), proper route filtering is critical to prevent the autonomous system from becoming a transit AS. Without appropriate filters, internet traffic could pass through the AS between different ISPs, consuming bandwidth and router resources. This transit risk is a fundamental concern in multi-ISP BGP deployments.

Implement strict prefix filtering on all eBGP sessions. Configure inbound filters to prevent acceptance of invalid or unauthorized prefixes. Configure outbound filters to advertise only authorized organizational prefixes.

Preventing Transit AS:

Configure AS-path access lists and route maps to advertise only locally originated routes to upstream providers. This prevents routes learned from one ISP from being advertised to another ISP, which would make the AS a transit path for internet traffic.

Use AS-path filtering that permits only routes with empty AS paths (locally originated routes). Apply route maps to all eBGP neighbors that filter outbound advertisements to include only routes originating within the local AS. This ensures that routes learned from one upstream provider are never advertised to another provider.

Route Acceptance Strategies:

Organizations can implement different strategies for accepting routes from upstream providers:

Full Internet Routing Table: Accept all routes from each ISP for maximum routing intelligence and path selection capabilities. This approach provides complete visibility but requires adequate router resources.

Directly-Connected Routes: Accept only routes for networks directly connected to each ISP, combined with default routes for general internet connectivity. This reduces routing table size while maintaining some visibility into ISP-specific paths.

Default Routes Only: Accept only default routes from ISPs and advertise organizational prefixes. This minimizes routing table requirements but provides limited path selection intelligence. This strategy is particularly useful when routers risk being overwhelmed by large amounts of routing information from BGP peers. Filtering to accept only default routes controls the size of the local routing table without losing IP connectivity to remote networks. The default route learned from the BGP neighbor can be conditionally advertised based on the existence of other routes in the routing table.

Prefix lists configured to permit only the default route (0.0.0.0/0) enable this filtering. Filtering can be applied to both incoming advertisements (routes learned from neighbors) and outgoing advertisements (routes sent to neighbors), providing bidirectional control over routing information exchange.

The choice depends on organizational requirements for traffic engineering, available resources, and operational complexity tolerance.

Prevent route leaks between upstream providers through appropriate filtering. Configure maximum prefix limits to prevent routing table overflow from misconfigured peers. Implement prefix validation using available mechanisms.

Verification Steps

Verify BGP session status and confirm all peering sessions establish successfully. Check that prefix counts align with expectations based on configured acceptance policies. Monitor for route flapping or stability issues.

Validate inter-site iBGP connectivity and confirm routing information is shared properly between all sites. Verify that all sites have reachability information for critical destinations.

Examine path selection for important destinations including content delivery networks and cloud services. Verify that best path selection follows configured policies and business requirements. Confirm AS path prepending is working as intended by examining BGP attributes for announced routes at Border Leaf switches.

If BGP is configured on NSX Tier-0 Gateway, verify BGP neighbor adjacencies through NSX Manager, review learned and advertised routes, and confirm route redistribution from connected segments functions correctly. Validate that route maps are applied correctly to BGP neighbors and AS path prepending values appear in route advertisements as expected.

For Active-Standby deployments with Failover Domain, verify that Edges in the primary datacenter are active and Edges in secondary datacenter are standby. Test failover scenarios by simulating Edge failures in the primary site and confirming that another Edge in the same site takes over before failing to the remote site.

Verify AS path prepending configuration by checking advertised routes at Border Leaf switches. Confirm that routes from the non-preferred datacenter have longer AS paths due to prepending, making them less desirable for inbound traffic.

Test connectivity from workloads at all sites to diverse internet destinations. Verify that critical services remain accessible and perform as expected.

Operational Considerations

Team Skills: BGP requires specialized knowledge. Network operations teams need expertise in BGP fundamentals, path selection algorithms, attribute manipulation for traffic engineering including AS path prepending, Failover Domain configuration via API, troubleshooting, and security best practices.

Documentation: Maintain comprehensive documentation including:

  • AS number assignments for each Tier-0 Gateway and site
  • BGP peering relationships between Tier-0 Gateways and Border Leaf switches
  • Route filtering policies and prefix lists
  • Traffic engineering policies including AS path prepending configurations for each site
  • Failover Domain assignments for each Edge node
  • Topology diagrams showing BGP session relationships across sites

Change Management: Establish change control procedures for BGP modifications. Test routing policy changes including AS path prepending adjustments before implementation. Schedule changes during maintenance windows when appropriate. Document rollback procedures for BGP policy changes.

Monitoring: Implement comprehensive monitoring for:

  • BGP session state changes at both physical and NSX layers
  • Prefix count deviations from baseline
  • Route flapping patterns
  • AS path length changes indicating prepending issues or path changes
  • Failover events between sites or within sites
  • Traffic distribution across datacenters to verify load distribution

Upstream Provider Relationships: Maintain active relationships with upstream providers. Establish escalation contacts for routing issues. Request notification of maintenance affecting BGP peering. Understand provider routing policies and how they interact with AS path prepending strategies.


Option 2: OSPF with Static Routing (Small Minority of Deployments)

Architecture Overview

Use OSPF for dynamic routing within and between NSX sites. Configure static routes for external connectivity through a single upstream provider.

This architecture is appropriate only for the small minority of NSX deployments that operate with a single upstream provider providing consistent routing across all locations.

When This Architecture May Be Appropriate

  • All sites definitively connect to the same upstream provider with long-term commitment
  • The provider offers guaranteed consistent peering and routing at all locations
  • The deployment is in early phases with planned evolution to BGP
  • The organization operates in a region with limited upstream options

Organizations should evaluate whether implementing BGP from the start provides better long-term value even in single upstream provider scenarios. Migration from OSPF/static to BGP later involves significant rearchitecture.

Implementation Guidance

OSPF Configuration:

Configure OSPF for inter-site routing between NSX sites. Establish OSPF adjacencies between NSX Edges and physical infrastructure. Configure appropriate interface costs to influence internal path selection. Implement OSPF authentication for security.

Static Route Configuration:

Configure static default routes pointing to upstream provider gateways. For redundant connections to the same provider, configure multiple static routes with different administrative distances for automatic failover. Consider implementing active path monitoring if dynamic failover is required.

NSX Tier-0 Configuration:

Configure route redistribution on Tier-0 Gateway to advertise NSX segments into OSPF for internal reachability. Set filters to control advertisement scope. Configure default route pointing to physical infrastructure for external connectivity.

Limitations of This Architecture

No Inter-Domain Intelligence: Static routes provide only next-hop information without destination-specific awareness. Cannot adapt to upstream provider routing changes or varying reachability characteristics for different destinations.

No Traffic Engineering Capabilities: Cannot implement AS path prepending or other BGP-based traffic engineering features for multi-site deployments. No ability to control which datacenter is preferred for specific traffic flows.

No Failover Domain Support: Advanced features like Failover Domain for Edge placement control are typically used in conjunction with BGP architectures for proper traffic engineering across sites.

Future Migration Complexity: When additional upstream providers become necessary or multi-site traffic engineering is required, complete routing architecture redesign is required. BGP cannot be added incrementally without disruption.

Limited Path Control: Cannot influence path selection for external traffic beyond basic primary/backup static route configuration. No granular control over paths to specific destinations or ability to implement active-standby behavior across sites with routing intelligence.

Upstream Dependency: Complete reliance on single provider's routing decisions with no autonomy over external path selection. Cannot optimize paths based on performance or cost requirements.

Growth Constraints: As deployment grows and additional sites are added in diverse regions, probability of requiring different upstream providers increases significantly. Architecture cannot accommodate this growth without redesign.


Migration Path: OSPF/Static to BGP

For the small minority of deployments that initially implement OSPF with static routing, migration to BGP requires planned rearchitecture:

Phase 1: BGP Planning and Preparation

Obtain AS number from regional registry or use private AS. Plan AS number strategy for multiple Tier-0 Gateways across sites. Establish BGP peering agreements with upstream providers. Size and deploy physical edge routers with adequate resources. Develop BGP policies for path selection, filtering, and traffic engineering including AS path prepending strategies. Ensure operations team has BGP expertise including API-based configuration for features like Failover Domain.

Phase 2: Physical Edge BGP Implementation

Configure eBGP sessions with upstream providers on physical routers. Receive and validate routing information. Test path selection without putting BGP into active forwarding path. Maintain existing OSPF/static routing during testing phase. Configure Border Leaf switches to serve as central BGP routing points for each datacenter.

Phase 3: Inter-Site iBGP Deployment

Establish iBGP sessions between sites. Verify routing table synchronization across all locations. Test failover scenarios in controlled environments. Ensure next-hop reachability across sites.

Phase 4: NSX Integration

Deploy BGP on NSX Tier-0 Gateways or configure default routes to intelligent physical edge based on chosen strategy. For multi-site deployments, configure IP prefix lists and route maps for AS path prepending. Configure Failover Domain using API if implementing active-standby mode across sites. Migrate traffic incrementally from static routes to BGP paths. Validate connectivity to all critical destinations throughout migration.

Phase 5: Cutover and Optimization

Complete transition to BGP for external routing. Remove static default routes once BGP is fully operational. Verify AS path prepending is controlling traffic flow as intended across sites. Test Edge failover within sites and across sites to validate Failover Domain behavior. Optimize BGP policies based on actual traffic patterns and requirements. Maintain OSPF for internal routing if desired or transition inter-site routing to BGP.


Conclusion

For NSX multi-site deployments, BGP represents the standard architecture. BGP provides the inter-domain routing intelligence required for enterprise networks and prepares infrastructure for future growth. Even when multiple upstream providers are not immediately needed, implementing both eBGP and iBGP establishes the foundation for scaling without rearchitecture.

In multi-site deployments with separated external routers per datacenter, BGP provides essential traffic engineering capabilities through features like AS path prepending and supports advanced deployment patterns using Failover Domain for optimal Edge placement control. These capabilities enable active-standby configurations across sites while distributing workload and maintaining routing intelligence.

OSPF with static routing serves only a small minority of deployments where all sites connect through a single upstream provider with consistent routing. Even in these scenarios, implementing BGP from initial deployment often provides better long-term value by avoiding costly rearchitecture as requirements evolve.

When designing new NSX deployments, BGP architecture should be the default choice. The investment in BGP expertise and infrastructure provides operational flexibility, reliability, traffic engineering capabilities, and performance optimization that align with the sophisticated networking requirements typical of NSX implementations.

If questions remain about routing architecture selection for specific deployment scenarios, please reach out to Enterprise Software Professional Services for architectural guidance.