Understanding CLOS Networks in Data Centers
search cancel

Understanding CLOS Networks in Data Centers

book

Article ID: 378928

calendar_today

Updated On:

Products

VMware NSX VMware NSX-T Data Center VMware vSphere ESXi

Issue/Introduction

In today's rapidly evolving digital landscape, data centers face unprecedented demands on their network infrastructure. The explosion of cloud computing, big data analytics, and high-performance applications has created a need for network architectures that can handle massive east-west traffic flows while maintaining low latency and high availability. Traditional three-tier network designs, with their inherent scalability limitations and potential for bottlenecks, are struggling to meet these demands. This scenario calls for a new approach to data center networking. Here is a diagram for a CLOS network

Environment

Modern data center networks are built using high-performance switches. These switches offer high port densities, often 48 or more ports of 10/25/100 Gigabit Ethernet, with some offering 400 Gigabit Ethernet capabilities. In a CLOS network implementation, these switches are deployed in two primary roles:

  1. Leaf Switches: Also known as Top of Rack (ToR) switches, these are typically 1U or 2U switches with a high number of server-facing ports (e.g., 48 x 10/25G) and several uplink ports (e.g., 6 x 100G). They directly connect to servers, storage devices, or other end-points.
  2. Spine Switches: These are often modular switches with a large number of high-speed ports (e.g., 32 x 100G). They form the core of the network and only connect to leaf switches.

These switches run network operating systems that support advanced routing protocols like BGP (Border Gateway Protocol) and OSPF (Open Shortest Path First), as well as Equal-Cost Multi-Path (ECMP) routing. Many also support Software-Defined Networking (SDN) protocols like OpenFlow, allowing for centralized control and programmability of the network.

In virtualized environments, ESXi hypervisor plays a crucial role. ESXi hosts connect to the leaf switches, typically with multiple physical NICs for redundancy and increased bandwidth. These connections can be individual links or bundled into link aggregation groups (LAGs).

NSX-T, a network virtualization and security platform, can be deployed on top of the physical CLOS network. NSX-T creates a virtual networking layer that spans the entire data center, allowing for the creation of logical switches, routers, firewalls, and load balancers that operate independently of the underlying physical network.

Cause

Several factors in modern data centers have driven the need for a new network architecture:

  1. Exponential Growth in East-West Traffic: With the rise of distributed applications, microservices, and scale-out architectures, the majority of data center traffic now flows between servers (east-west) rather than in and out of the data center (north-south).
  2. Need for Low and Predictable Latency: Many modern applications, from high-frequency trading to real-time analytics, require consistently low latency.
  3. Scalability Limitations of Traditional Designs: Three-tier architectures with large, chassis-based core switches face scalability challenges and can introduce single points of failure.
  4. Bandwidth Bottlenecks: Traditional designs often oversubscribe higher layers of the network, leading to potential congestion.
  5. Complexity of Network Management: As networks grow, the complexity of managing them increases exponentially in traditional architectures.
  6. Inefficient Resource Utilization: Static configurations in traditional networks often lead to underutilized links and stranded capacity.

These challenges call for a network architecture that is inherently scalable, provides predictable performance, offers multiple paths for resilience and load balancing, and can be managed efficiently at scale.

Resolution

CLOS networks, particularly in their leaf-spine implementation, address these challenges and provide a solution for building scalable, resilient, redundant, and high-performance networks. Here's how CLOS networks resolve the concerns, and how Broadcom products integrate with this architecture:

  1. Scalability:
    • The leaf-spine architecture allows for horizontal scaling. Need more server ports? Add more leaf switches. Need more east-west bandwidth? Add more spine switches.
    • Each new leaf switch connects to all spine switches, maintaining the network's full mesh topology.
    • This modular growth model allows data centers to scale from dozens to thousands of servers without fundamental redesigns.
    • NSX-T by Broadcom enhances the scalability of CLOS networks by allowing the creation of thousands of logical networks that can span the entire data center, without being constrained by physical VLAN limitations.
  2. Performance and Low Latency:
    • In a CLOS network, any server is at most three hops away from any other server (leaf-spine-leaf), regardless of network size.
    • This consistent hop count ensures predictable, low-latency performance across the entire data center.
    • The abundant east-west bandwidth supports the heavy server-to-server traffic patterns of modern applications.
    • ESXi's support for RDMA (Remote Direct Memory Access) over Converged Ethernet (RoCE) allows for ultra-low latency communication between servers, leveraging the consistent latency provided by the CLOS architecture.
    • NSX-T can optimize east-west traffic by allowing VMs on the same ESXi host to communicate directly, without the traffic having to leave the host and traverse the physical network.
  3. Resilience and Redundancy:
    • Every leaf switch connects to every spine switch, creating multiple paths between any two endpoints.
    • If a spine switch or a link fails, traffic can be instantly rerouted through other paths.
    • This multi-path design eliminates single points of failure and allows for hitless maintenance and upgrades.
    • ESXi supports NIC teaming, allowing multiple physical NICs to be grouped for increased bandwidth and redundancy, complementing the multi-path nature of CLOS networks.
    • NSX-T adds another layer of resilience by allowing for the creation of distributed logical routers that can continue to function even if physical routers fail.
  4. Load Balancing and Efficient Resource Utilization:
    • ECMP routing allows traffic to be distributed across all available paths.
    • This ensures efficient utilization of all network links, preventing bottlenecks and stranded capacity.
    • The even distribution of traffic also helps in absorbing traffic spikes and providing consistent performance.
    • NSX-T's distributed logical router can perform ECMP routing at the virtual layer, complementing the ECMP capabilities of the physical CLOS network.
    • Broadcom's StrataXGS switches, which are commonly used in CLOS networks, support advanced load balancing algorithms that can intelligently distribute traffic across multiple paths.
  5. Simplified Management:
    • The regularity of the CLOS topology simplifies network design and configuration.
    • Spine switches all perform the same role, as do leaf switches, allowing for standardized configurations.
    • This uniformity lends itself well to automation, reducing operational complexity as the network scales.
    • Broadcom's vCenter provides centralized management for ESXi hosts and virtual machines, while NSX-T Manager offers a single point of control for all virtual networking components.
    • Broadcom's software-defined networking (SDN) solutions, such as the OpenFlow Data Plane Abstraction (OF-DPA), can be used to centrally manage and program the physical switches in the CLOS network.
  6. Support for Network Virtualization and SDN:
    • The leaf-spine architecture provides an ideal underlay for network virtualization technologies.
    • It's well-suited for SDN implementations, allowing for centralized control and programmability of the entire fabric.
    • NSX-T by Broadcom is a prime example of how network virtualization can be implemented on top of a CLOS network. It creates an overlay network that can span multiple data centers and even extend into public clouds.
    • Broadcom's switches support various SDN protocols, allowing them to be integrated into Broadcom's SDN solutions or other third-party SDN controllers.
  7. Cost-Effective Scaling:
    • Instead of scaling up with expensive, chassis-based switches, CLOS networks scale out using smaller, standardized switch units.
    • This often results in better economics, especially when accounting for the ability to scale incrementally.
    • The combination of CLOS networks with Broadcom's virtualization stack allows for highly efficient use of resources. ESXi allows multiple VMs to share physical server resources, while NSX-T enables efficient use of network resources through network virtualization.
  8. Future-Proofing:
    • The modular nature of CLOS networks allows for easier adoption of new technologies.
    • Leaf or spine switches can be upgraded independently, allowing for phased introductions of higher-speed interfaces or new capabilities.

In conclusion, CLOS networks provide a comprehensive solution to the networking challenges faced by modern data centers. By offering a scalable, high-performance, resilient, and manageable architecture, they enable data centers to meet current demands while being well-positioned for future growth and technological advancements. The leaf-spine implementation of CLOS networks has become a de facto standard in many large-scale data centers, testament to its effectiveness in addressing the complex networking requirements of today's digital infrastructure. When combined with Broadcom's virtualization technologies like ESXi and NSX-T, and leveraging advanced switching capabilities, they create a powerful, flexible, and efficient infrastructure capable of meeting the most demanding computational and networking needs.

Additional Information