Scale testing a 100-node network fabric used to mean reserving a dedicated lab of physical switches and spending weeks cabling, configuring, and measuring. Cloud-hosted AI-built labs change that — with container-based vendor NOSes, a 100-node lab deploys in under 5 minutes. This guide walks through the methodology, covers what works at scale and what doesn't, and shows a worked CLOS fabric convergence experiment.
What "scale testing" means in 2026
"Scale" has always been context-dependent. For an enterprise branch deployment, 50 devices is large. For a hyperscaler fabric engineering team, 10,000 devices is a realistic production scale. Cloud-hosted labs fit into the middle: 100-500 node labs for research and validation, with ns-3 or simulation for larger-scale modeling.
The testable properties at 100-node scale:
- Cold convergence time — how long until every router's RIB-In stabilizes after cold boot
- Churn propagation — how a single flapping link affects the rest of the fabric
- BUM traffic replication — EVPN ingress replication scale, BGP-based multicast signaling scale
- Memory and CPU footprint per router under steady-state large-table conditions
- Failure domain size — how a specific topology contains or propagates failures
- Upgrade/restart behavior — rolling restart patterns for large iBGP meshes or route-reflected fabrics
Building a 100-node fabric lab
The prompt
Copy into NetPilot:
Build a 100-node CLOS fabric: 4 spine routers and 96 leaf routers arranged in 4 PODs of 24 leaves each, all using FRR. Spine routers in AS 65000, each leaf in its own AS 65001-65096 for eBGP underlay. Enable BGP EVPN address family for leaf-to-leaf Type-2 and Type-5 route exchange. Configure 10 VNIs (VNI 10100-10109) mapped to 10 VLANs per leaf. Advertise 5 unique /32 loopback prefixes from each leaf. Place 1 host per leaf. Include a Linux measurement node with tcpdump, tshark, iperf3, and tc/netem for scripted impairment and convergence measurement.
What this actually deploys
- 4 spine FRR routers
- 96 leaf FRR routers
- 96 Linux hosts
- 1 Linux measurement node
- ~200 virtual links
- ~500 /32 loopback prefixes
- 10 VNIs per leaf = 960 local VNI bindings + 960 × 95 remote VNI bindings for full-mesh BUM replication
Lab deployment time: ~5 minutes on enterprise plan with dedicated cloud VM sized appropriately.
Per-leaf verification
Ask the agent:
"Check BGP EVPN state across all 100 nodes — flag any leaf with fewer than 4 spine peerings or fewer than ~480 prefixes in its RIB-In."
The agent queries all 100 devices in parallel and returns a summary table with any anomalies flagged. At baseline, every leaf should show 4 established spine peerings and ~480 prefixes received.
Direct CLI available for spot-checking a specific leaf:
vtysh
show bgp ipv4 unicast summary
show bgp ipv4 unicast
show bgp l2vpn evpn summary
show bgp l2vpn evpnWhat works at 100-node scale
Several things that are untestable or painful on a 5-node lab become both testable and fast at 100-node:
Cold convergence time measurement
Ask the agent: "Trigger a cold convergence on all 100 nodes and measure the time until every leaf's RIB-In is fully populated." The agent restarts BGP on all nodes and polls until convergence, returning the elapsed time.
For publication-grade statistical sweeps (30+ trials), write a measurement script directly on the control node — this gives you precise timestamps and raw packet access:
# Custom convergence measurement script — on the measurement node
start_time=$(date +%s.%N)
# Trigger cold convergence on all nodes in parallel
for node in leaf{1..96} spine{1..4}; do
ssh $node 'pkill bgpd; sleep 10; bgpd -d' &
done
wait
# Poll every 5 seconds until all routes are present
while true; do
all_synced=true
for node in leaf{1..96}; do
count=$(ssh $node 'vtysh -c "show bgp ipv4 unicast" | wc -l')
if [ $count -lt 500 ]; then
all_synced=false
break
fi
done
if $all_synced; then
end_time=$(date +%s.%N)
echo "Convergence time: $(echo $end_time - $start_time | bc) seconds"
break
fi
sleep 5
doneRecord the result. Vary topology, protocol, timers. Compare.
Churn propagation
Flap a single link at one specific leaf. Measure how the update storm propagates across the fabric.
# Flap leaf50's link to spine2
for i in {1..5}; do
ssh leaf50 'ip link set eth1 down; sleep 2; ip link set eth1 up'
sleep 10
done
# Measure UPDATE messages observed at an uninvolved leaf (e.g., leaf10)
ssh leaf10 'tcpdump -i any -s 0 port 179 -w /tmp/churn.pcap &'
# ... run the flap ...
# Post-process the PCAP for UPDATE count per minuteExpected result: churn from one leaf's flap cascades through the BGP mesh, with UPDATE volume peaking during and shortly after the flap. Scale effects show up as >O(N) propagation.
Route table scale
Each leaf receives BGP routes from 95 other leaves. At 5 prefixes per leaf × 96 leaves = ~480 prefixes per leaf's RIB-In. Real enterprise fabrics may have 10,000-100,000 prefixes — NetPilot labs at this tier are still useful to validate behavior changes before scaling to ns-3 for the final capacity study.
EVPN BUM replication scale
With 96 leaves participating in ingress replication for 10 VNIs each, every BUM packet replicates to 95 other VTEPs per VNI. Scale effects on replication-list management, memory footprint, and ARP suppression behavior all show up at this size.
What doesn't work at 100-node scale (cloud lab limitations)
Honest limitations:
Line-rate traffic generation
Cloud VMs handle reasonable traffic rates (Gbps, not 100Gbps). If your research requires line-rate testing, you still need hardware testers like Keysight IxNetwork or VIAVI TestCenter (formerly Spirent TestCenter).
Very large memory footprints
At 1000+ prefix scale, FRR's memory can balloon. Cloud VM memory limits make sustained 100,000+ prefix labs harder; 5-10k prefixes is comfortable.
Radio-layer modeling
If your scale testing involves RF or wireless channel effects, use ns-3 or EMANE. NetPilot is a wired-network lab platform.
Extreme scale (1000+ nodes)
Above ~200-300 nodes, cloud VM constraints make a single-VM lab impractical. For 1000+ node studies, use ns-3, distributed container orchestration, or physical hardware.
The measurement node pattern
Every scale-testing lab should include a dedicated Linux measurement node separate from the fabric itself. Reasons:
- Runs impairment scripts (tc/netem) without affecting data-plane measurements
- Collects tcpdump/tshark traces from span interfaces without consuming router CPU
- Runs long-polling scripts that measure convergence without affecting BGP
- Stores raw measurement data (CSVs, PCAPs) for post-experiment analysis
Include the measurement node in the initial prompt; adding it later requires redeploying the lab.
Repeatable experimental methodology
For publishable or vendor-reportable scale studies:
- Fixed prompt — exact topology as natural language
- Fixed vendor versions (BYOI on enterprise plan)
- Fixed measurement scripts — in git
- N ≥ 30 trials per variable — account for cloud-VM noise
- Measurement outside the fabric — always on a separate node
- Statistical rigor — report medians, percentiles, standard deviations (not just averages)
FAQ
How many nodes can a NetPilot lab scale to?
Lab size depends on cloud VM resources. Standard tier supports up to ~50 nodes; enterprise plans with dedicated VMs handle 100-300 nodes depending on the vendor mix. FRR scales further than commercial NOS containers because FRR's memory footprint per router is smaller.
Can I test 1000-node fabrics?
Not in a single NetPilot lab at this time. For 1000+ node studies, use ns-3 (pure simulation), distributed container orchestration with Kubernetes, or physical hardware. NetPilot at 100-300 nodes covers most research and validation use cases.
What's the difference between cloud-lab scale testing and ns-3 simulation?
Cloud labs run real routing daemon code, so behavior matches production at the limit of the daemon's faithful emulation. ns-3 uses protocol models in C++, so scale is unlimited but behavior only matches production to the fidelity of the model. Most research programs use both — cloud labs for behavior validation, ns-3 for scale studies.
How do I measure convergence precisely at 100-node scale?
Use the measurement node with packet captures on span ports. Post-process PCAPs with tshark to extract per-UPDATE timestamps. For wall-clock convergence, poll every router's RIB from the measurement node every 1-5 seconds and record the timestamp at which the last router's table reaches the expected state.
Can I run CLOS, spine-leaf, hub-and-spoke, or mesh topologies at scale?
Yes — topology is determined by the prompt. NetPilot handles arbitrary topologies up to its node-count limit. CLOS (3-tier or 5-tier), spine-leaf, multi-site fabric interconnect, mesh, hub-and-spoke, or arbitrary research graphs are all supported.
Copy-paste ready: The scale testing 100-node fabric prompt is the CLOS lab template for this guide.
Running hyperscaler fabric engineering or large-scale research? The Network Research Lab hub is built for this workflow. Contact sales for enterprise plans with high-node-count dedicated environments.