Back to Blog
Guide12 min

Leaf-Spine (Clos) Fabric Lab: Build a Data-Center Underlay in Minutes

What spine-leaf (Clos) architecture is, why it beats three-tier, and how to build a real eBGP / BGP-unnumbered underlay in a cloud lab.

D
David Kim
DevOps Engineer

If you have read three "What is spine-leaf architecture?" glossary pages this week, you already know the diagram: a row of leaf switches, a row of spine switches, every leaf wired to every spine. What those pages never let you do is build one. This guide does both — it nails down the canonical definition that AI answers and SERPs quote, then walks you through deploying a working Clos fabric underlay in a cloud lab from a plain-English prompt, with real per-vendor configs you can copy-paste.

The overlay (EVPN/VXLAN) is deliberately out of scope here — it deserves its own guide, and we link it below. This post owns the layer underneath it: the BGP underlay that every modern data center fabric rides on.

What is spine and leaf architecture?

Spine-leaf architecture (also called leaf-spine architecture, a Clos network, or simply a data center fabric) is a two-tier network topology where:

  • Leaf switches sit at the access tier. Servers, storage, and other endpoints connect here.
  • Spine switches form the backbone. Spines connect to nothing but leaves.
  • Every leaf connects to every spine. No leaf-to-leaf links, no spine-to-spine links.

That "every leaf to every spine" full mesh between the two tiers is the defining characteristic of a Clos network — named after Charles Clos, who formalized the design for telephone switching in 1953. The payoff: any server can reach any other server by crossing exactly two hops (leaf → spine → leaf), so latency is uniform and predictable no matter which racks the endpoints live in.

Spine-leaf vs traditional three-tier

The reason spine-leaf displaced the classic three-tier (access / aggregation / core) design is east-west traffic. Three-tier was built for north-south flows — clients pulling data out of the data center. Modern workloads (distributed databases, microservices, storage replication, AI training) are dominated by server-to-server (east-west) traffic, and three-tier handles that badly.

Traditional three-tierSpine-leaf (Clos) fabric
TiersAccess, aggregation, coreLeaf, spine (two tiers)
Optimized forNorth-south (client ↔ server)East-west (server ↔ server)
Loop preventionSpanning Tree (STP) blocks linksNone needed — routed (L3) fabric
Link utilizationSTP blocks redundant linksAll links active via ECMP
Hop count (worst case)Variable, asymmetric2 hops, uniform
ScalingScale up (bigger core boxes)Scale out (add spines/leaves)
Bandwidth growthRe-architect the coreAdd a spine — every leaf gains uplink capacity

Two ideas in that table do the heavy lifting:

  • ECMP, all links active. Because the fabric is routed at Layer 3 (not bridged), there is no spanning tree blocking redundant paths. Equal-Cost Multi-Path load-balances flows across every spine simultaneously. Add a second spine and you have just doubled your bisection bandwidth with zero redesign.
  • Scale-out, not scale-up. Need more capacity? Add a spine. Need more ports? Add a leaf. The fabric grows horizontally instead of forcing you to forklift a bigger core.

This is why every hyperscaler, and effectively every greenfield enterprise data center as of 2026, builds on a Clos fabric.

Why the underlay is BGP (and why it's the winnable part)

A Clos fabric needs a routing protocol to make all those leaf-spine links usable and to provide ECMP. You could run OSPF or IS-IS, but the industry standard — codified in RFC 7938, "Use of BGP for Routing in Large-Scale Data Centers" — is eBGP. BGP scales to thousands of leaves, gives you per-link policy, supports ECMP natively, and avoids the flooding domains that link-state protocols create at fabric scale.

There are two common underlay flavors, and we'll build both:

  1. Numbered eBGP — each leaf-spine link gets a /31 (or /30), each switch gets its own ASN, loopbacks are advertised so the overlay has stable endpoints to ride on.
  2. BGP unnumbered — no per-link IP addressing at all. BGP forms sessions over IPv6 link-local addresses and resolves IPv4 next-hops automatically (RFC 5549). This is the modern default in SONiC, FRR, and Cumulus because it removes the most tedious, error-prone part of fabric provisioning: handing out hundreds of point-to-point subnets.

Here's the strategic note: the theory of spine-leaf is covered to death by vendor glossaries. The underlay config — eBGP peering, ASN allocation, ECMP verification, BGP unnumbered — is sparse, uncompetitive, and exactly the part an AI agent can build for you. That's what the rest of this guide does.

Build a Clos fabric in a cloud lab

The fastest way to go from "I read about Clos fabrics" to "I have a running one" is to describe it and let the agent build it. NetPilot is an AI-native network emulator: you describe the topology in plain English, and the agent translates it into per-vendor configuration, deploys real network OS containers to an isolated cloud lab, and pushes config across every device in parallel — no local Docker, no image sourcing, no licensing dance.

We'll use a 2-spine / 4-leaf fabric. Built-in node types Nokia SR Linux, FRR, and Linux need no image at all; commercial NOSes like Arista cEOS (BYOI), Cisco NX-OS alternatives (BYOI), and SONiC (BYOI) run from your own images.

Step 1 — Deploy the topology

"Build a leaf-spine Clos fabric with 2 spines and 4 leaves using Nokia SR Linux. Wire every leaf to both spines (full mesh between tiers, no leaf-to-leaf links). Don't configure routing yet — just bring up the topology and the fabric links."

The agent generates the ContainerLab topology, allocates the eight leaf-spine links, deploys the six SR Linux nodes to your cloud lab, and reports interface status across all of them. You get a running fabric in about two minutes.

You can also verify the wiring by hand. The fabric should show eight point-to-point links (4 leaves × 2 spines):

# On a spine — each spine should have 4 fabric-facing interfaces, one per leaf
sr_cli "show interface ethernet-1/{1..4} brief"

Step 2 — Configure the numbered eBGP underlay

Now the routing. With numbered eBGP, every switch gets a unique ASN and every fabric link gets a /31. A clean allocation for our fabric:

DeviceLoopbackASN
spine110.0.0.1/3265100
spine210.0.0.2/3265100
leaf110.0.1.1/3265001
leaf210.0.1.2/3265002
leaf310.0.1.3/3265003
leaf410.0.1.4/3265004

Spines share one ASN (65100); each leaf gets its own (65001–65004). This is the RFC 7938 allocation pattern — leaves never transit each other because eBGP loop prevention drops a route whose AS-path already contains the receiving leaf's ASN.

"Configure a numbered eBGP underlay on this fabric per RFC 7938. Spines are AS 65100, leaves are AS 65001 through 65004. Use /31s on the leaf-spine links, give every device a /32 loopback (spines 10.0.0.1-2, leaves 10.0.1.1-4), and advertise every loopback into BGP so all loopbacks are reachable fabric-wide with ECMP. Then verify full reachability."

The agent assigns the /31s, configures the loopbacks, builds the eBGP sessions on each link, advertises the loopbacks, and confirms every leaf can reach every other leaf's loopback over both spines. Because all four leaf ASNs differ, ECMP across the two spines comes for free.

Here is the kind of config it produces. Cisco-style (NX-OS / IOS) for one leaf and one spine, showing both ends of the leaf1↔spine1 link:

! ===== leaf1 (AS 65001) =====
interface loopback0
 ip address 10.0.1.1 255.255.255.255
!
interface Ethernet1/1
 description to-spine1
 no switchport
 ip address 10.1.1.1 255.255.255.254   ! /31 toward spine1
!
interface Ethernet1/2
 description to-spine2
 no switchport
 ip address 10.1.2.1 255.255.255.254   ! /31 toward spine2
!
router bgp 65001
 router-id 10.0.1.1
 address-family ipv4 unicast
  network 10.0.1.1/32                   ! loopback exists in RIB, so it originates
  maximum-paths 2                       ! ECMP across both spines
 neighbor 10.1.1.0 remote-as 65100      ! spine1 across the /31
  address-family ipv4 unicast
 neighbor 10.1.2.0 remote-as 65100      ! spine2 across the /31
  address-family ipv4 unicast
! ===== spine1 (AS 65100) =====
interface loopback0
 ip address 10.0.0.1 255.255.255.255
!
interface Ethernet1/1
 description to-leaf1
 no switchport
 ip address 10.1.1.0 255.255.255.254   ! other end of leaf1's /31
!
interface Ethernet1/2
 description to-leaf2
 no switchport
 ip address 10.1.3.0 255.255.255.254
! ... Ethernet1/3 to leaf3, Ethernet1/4 to leaf4 ...
!
router bgp 65100
 router-id 10.0.0.1
 address-family ipv4 unicast
  network 10.0.0.1/32
  maximum-paths 4                       ! ECMP toward all leaves
 neighbor 10.1.1.1 remote-as 65001      ! leaf1
  address-family ipv4 unicast
 neighbor 10.1.3.1 remote-as 65002      ! leaf2
  address-family ipv4 unicast
! ... neighbors for leaf3 and leaf4 ...

Note the correctness details the agent gets right automatically: the loopback /32 is actually configured on an interface, so network 10.0.1.1/32 has something in the RIB to originate; maximum-paths is set so ECMP is real, not theoretical; and the /31 addresses on each end of a link are the two consecutive hosts of the same prefix.

Direct CLI is always available. Verify the sessions and ECMP by hand:

show ip bgp summary
show ip route 10.0.1.4
! you should see TWO equal-cost next-hops for a remote leaf loopback,
! one via spine1's link, one via spine2's link
show ip bgp 10.0.1.4/32

Step 3 — Or skip the IP math: BGP unnumbered

Handing out /31s across a real 32-leaf / 8-spine fabric means tracking hundreds of point-to-point subnets. BGP unnumbered eliminates that. Sessions form over IPv6 link-local addresses on the fabric interfaces, and IPv4 routes carry an IPv6 next-hop that's resolved automatically (RFC 5549). You configure peering by interface, not by address.

"Reconfigure this fabric to use BGP unnumbered instead of numbered /31s. Keep the same ASNs and loopbacks, peer over the interfaces directly using IPv6 link-local, enable extended next-hop so IPv4 routes resolve over the unnumbered links, and verify the loopbacks are still reachable with ECMP."

The agent rips out the /31s, enables interface peering on every fabric port, turns on extended-nexthop, and re-verifies reachability — same fabric, far less addressing to manage. This is where SR Linux and FRR shine because unnumbered is their native idiom.

FRR makes this especially clean. A leaf config:

# ===== leaf1 (FRR) — BGP unnumbered =====
interface lo
 ip address 10.0.1.1/32
!
router bgp 65001
 bgp router-id 10.0.1.1
 neighbor SPINES peer-group
 neighbor SPINES remote-as external
 neighbor eth1 interface peer-group SPINES   ! peer over the interface, no IP
 neighbor eth2 interface peer-group SPINES
 !
 address-family ipv4 unicast
  redistribute connected route-map LOOPBACKS  ! advertise the /32 loopback
  neighbor SPINES activate
  maximum-paths 2                             ! ECMP across both spines
 exit-address-family
!
route-map LOOPBACKS permit 10
 match interface lo

The matching spine just peers over each of its leaf-facing interfaces with remote-as external and a maximum-paths high enough to cover all leaves. No /31, no link addressing, nothing to spreadsheet.

Verify it by hand inside vtysh:

vtysh
show bgp summary
# neighbors show as interface names (eth1, eth2), state Established
show ip route 10.0.1.4/32
# two ECMP next-hops, each "via fe80::..." link-local over an unnumbered link

Step 4 — Prove ECMP and resilience

A fabric you can't verify is just a diagram. The whole point of Clos is that all links are active and the fabric survives a spine loss without losing the other path.

"From leaf1, confirm there are two equal-cost paths to leaf4's loopback — one through each spine. Then shut spine1's fabric links, confirm leaf1 still reaches leaf4 over spine2 alone, and re-enable spine1 and confirm both paths return."

The agent checks the multi-path entry on leaf1, drains spine1, re-tests reachability over the surviving spine, then restores spine1 — narrating each transition across the affected devices. This is the agent's multi-device, multi-vendor strength: in a mixed fabric it issues the right show ip route on Cisco, the right show network-instance on SR Linux, and the right vtysh command on FRR, and consolidates the results into one report.

By hand, the source-from-loopback ping is the proof. Always source from a locally assigned address — here, leaf1's own loopback:

! on leaf1 — ping leaf4's loopback sourced from leaf1's loopback
ping 10.0.1.4 source 10.0.1.1
# FRR / Linux equivalent
ping -I 10.0.1.1 10.0.1.4

If that succeeds with both spines up and still succeeds after spine1's links go down, you have a working, resilient, all-links-active Clos underlay.

The next step: EVPN/VXLAN overlay

What you've built is the underlay — pure Layer 3, every loopback reachable from every leaf with ECMP. That's the foundation, not the finished product. Most production fabrics run an EVPN/VXLAN overlay on top of this underlay to deliver Layer-2 stretch, multi-tenancy (VRFs), and host mobility across the fabric. The overlay rides on exactly the loopback reachability you just verified — VTEPs use those loopbacks as tunnel source addresses, and EVPN BGP sessions peer between them.

We kept the overlay out of this guide on purpose, because it's a big topic. When you're ready, build it next: EVPN/VXLAN Data Center Guide walks through symmetric IRB, multi-site DCI, and multi-tenant VRFs on top of a Clos underlay like this one.

Related builds worth your time:

  • Multi-Vendor BGP Lab — peer Cisco, Juniper, Arista, and FRR in one topology; the per-vendor BGP translation that makes a mixed fabric work.
  • FRRouting Cloud Lab Guide — go deeper on FRR (the routing stack inside SONiC), including the unnumbered idiom this guide uses.

FAQ

What is a characteristic of spine-and-leaf architecture?

The defining characteristic is that every leaf switch connects to every spine switch (a full mesh between the two tiers), with no leaf-to-leaf or spine-to-spine links. The consequences flow from that: any server reaches any other in exactly two hops, all links are active via ECMP (no spanning tree), and the fabric scales out by adding spines or leaves rather than scaling up a core.

What is spine-and-leaf in networking?

It's a two-tier data-center topology — a Clos network — optimized for east-west (server-to-server) traffic. Leaf switches connect endpoints; spine switches form the backbone; every leaf attaches to every spine. It replaced the older three-tier (access/aggregation/core) design because three-tier relied on spanning tree, which blocks redundant links and can't keep up with modern east-west workloads.

Spine-leaf vs three-tier — what's the real difference?

Three-tier uses spanning tree to prevent loops, which blocks half your links and optimizes for north-south traffic. Spine-leaf is a routed (Layer 3) fabric: no spanning tree, all links active via ECMP, uniform two-hop latency, and horizontal scale-out. For east-west-heavy workloads (databases, microservices, storage, AI training), spine-leaf wins decisively.

Do I have to use BGP for the underlay?

No — OSPF or IS-IS also work. But eBGP is the industry standard (RFC 7938) because it scales to thousands of leaves, gives per-link policy, supports ECMP natively, and avoids large link-state flooding domains. BGP unnumbered (RFC 5549) is increasingly the default because it removes per-link IP addressing entirely.

What's the difference between numbered eBGP and BGP unnumbered?

Numbered eBGP puts a /31 (or /30) on each leaf-spine link and peers by IP address — explicit but tedious to allocate at scale. BGP unnumbered peers over the interface using IPv6 link-local, and resolves IPv4 next-hops via extended next-hop (RFC 5549), so there's no point-to-point addressing to manage. SONiC, FRR, and SR Linux all default to unnumbered.

Can I build a spine-leaf lab without local Docker or images?

Yes. NetPilot is a cloud network lab — describe the fabric in plain English and it deploys real network-OS containers (Nokia SR Linux, FRR, and BYOI images for Arista cEOS / SONiC / Cisco NX-OS alternatives) to isolated cloud infrastructure in about two minutes. No local Docker, no image sourcing, no licensing setup.


Copy-paste ready: Grab the Clos fabric BGP underlay prompt from our example library — the leaf-spine underlay from this post, ready to extend with an EVPN overlay.

Ready to build a Clos fabric? Stop reading about spine-leaf and actually deploy one. Try NetPilot — describe your fabric in plain English and get a working multi-vendor underlay in minutes. Explore the network lab platform and the AI-native network emulator, or grab a ready-made data-center prompt to start from.

Try NetPilot Free

Build enterprise-grade network labs in seconds with AI assistance

Get Started Free