1 Growth in Router Capacity IPAM, Lake Arrowhead October 2003 Nick McKeown Professor of Electrical Engineering and Computer Science, Stanford University.

1 Growth in Router Capacity IPAM, Lake Arrowhead October 2003 Nick McKeown Professor of Electrical Engineering and Computer Science, Stanford University nickm@stanford.edu www.stanford.edu/~nickm

2 Generic Router Architecture Lookup IP Address Update Header Header Processing DataHdrDataHdr ~1M prefixes Off-chip DRAM Address Table Address Table IP AddressNext Hop Queue Packet Buffer Memory Buffer Memory ~1M packets (100-400ms) Off-chip DRAM

3 Generic Router Architecture Lookup IP Address Update Header Header Processing Address Table Address Table Lookup IP Address Update Header Header Processing Address Table Address Table Lookup IP Address Update Header Header Processing Address Table Address Table Buffer Manager Buffer Memory Buffer Memory Buffer Manager Buffer Memory Buffer Memory Buffer Manager Buffer Memory Buffer Memory

4 What a High Performance Router Looks Like Cisco GSR 12416 Juniper M160 6ft 19” 2ft Capacity: 160Gb/s Power: 4.2kW 3ft 2.5ft 19” Capacity: 80Gb/s Power: 2.6kW

5 Backbone router capacity 1Tb/s 1Gb/s 10Gb/s 100Gb/s Router capacity per rack 2x every 18 months

6 Backbone router capacity 1Tb/s 1Gb/s 10Gb/s 100Gb/s Router capacity per rack 2x every 18 months Traffic 2x every year

7 Extrapolating 1Tb/s Router capacity 2x every 18 months Traffic 2x every year 100Tb/s 2015: 16x disparity

8 Consequence  Unless something changes, operators will need:  16 times as many routers, consuming  16 times as much space,  256 times the power,  Costing 100 times as much.  Actually need more than that…

9 What limits router capacity? Approximate power consumption per rack Power density is the limiting factor today

10 Crossbar Linecards Switch Linecards Trend: Multi-rack routers Reduces power density

11 Alcatel 7670 RSP Juniper TX8/T640 TX8 Chiaro Avici TSR

12 Trend: Single POP routers  Very high capacity (10+Tb/s)  Line-rates T1 to OC768 Reasons:  Big multi-rack router more efficient than many single-rack routers,  Easier to manage fewer routers.

13 Router linecard Physical Layer Framing & Maintenance Packet Processing Buffer Mgmt & Scheduling Buffer Mgmt & Scheduling Buffer & State Memory Buffer & State Memory OC192c linecard  30M gates  2.5Gbits of memory  2-300W  1m 2  $25k cost, $100k price. Lookup Tables Optics Scheduler 40-55% of power in chip-to-chip serial links

14 What’s hard, what’s not  Linerate fowarding:  Linerate LPM was an issue for while.  Commercial TCAMs and algorithms available up to 100Gb/s.  1M prefixes fit in corner of 90nm ASIC.  2 32 addresses will fit in a $10 DRAM in 8 years  Packet buffering:  Not a problem up to about 10Gb/s; big problem above 10Gb/s.  More on this later…  Header processing:  For basic IPv4 operations: not a problem.  If we keep adding functions, it will be a problem.  More on this later…

15 What’s hard, what’s not (2)  Switching  If throughput doesn’t matter: Easy: Lots of multistage, distributed or load-balanced switch fabrics.  If throughput matters: Use crossbar, VOQs and centralized scheduler Or multistage fabric and lots of speedup.  If throughput guarantee is required: Maximal matching, VOQs and speedup of two [Dai & Prabhakar ‘00]; or Load-balanced 2-stage switch [Chang 01; Sigcomm 03].

16 What’s hard  Packet buffers above 10Gb/s  Extra processing on the datapath  Switching with throughput guarantees

17 Packet Buffering Problem Packet buffers for a 160Gb/s router linecard Buffer Memory Write Rate, R One 128B packet every 6.4ns Read Rate, R One 128B packet every 6.4ns 40Gbits Buffer Manager Problem is solved if a memory can be (random) accessed every 3.2ns and store 40Gb of data Scheduler Requests

18 Memory Technology  Use SRAM? + Fast enough random access time, but - Too low density to store 40Gbits of data.  Use DRAM? + High density means we can store data, but - Can’t meet random access time.

19 Can’t we just use lots of DRAMs in parallel? Write Rate, R One 128B packet every 6.4ns Read Rate, R One 128B packet every 6.4ns Buffer Manager Buffer Memory Read/write packets in larger blocks 1280B Buffer Memory Buffer Memory Buffer Memory Buffer Memory Scheduler Requests

20 128B Works fine if there is only one FIFO Write Rate, R One 128B packet every 6.4ns Read Rate, R One 128B packet every 6.4ns Buffer Manager (on chip SRAM) 1280B Buffer Memory 1280B 128B 1280B 0-127128-255…1152-1279……………… 128B Aggregate 1280B for the queue in fast SRAM and read and write to all DRAMs in parallel Scheduler Requests

21 In practice, buffer holds many FIFOs 1280B 1 2 Q e.g.  In an IP Router, Q might be 200.  In an ATM switch, Q might be 10 6. Write Rate, R One 128B packet every 6.4ns Read Rate, R One 128B packet every 6.4ns Buffer Manager 1280B 320B ?B 320B 1280B ?B How can we write multiple packets into different queues? 0-127128-255…1152-1279……………… Scheduler Requests

22 Buffer Manager Arriving Packets R Scheduler Requests Departing Packets R 12 1 Q 2 1 2 34 34 5 123456 Small head SRAM cache for FIFO heads (ASIC with on chip SRAM) Parallel Packet Buffer Hybrid Memory Hierarchy cache for FIFO tails 55 56 9697 87 88 57585960 899091 1 Q 2 Small tail SRAM Large DRAM memory holds the body of FIFOs 5768109 798 11 12141315 5052515354 868887899190 8284838586 929493 95 68791110 1 Q 2 Writing b bytes Reading b bytes DRAM b = degree of parallelism

23 Problem:  What is the minimum size of the SRAM needed so that every packet is available immediately within a fixed latency? Solutions:  Qb(2 +ln Q) bytes, for zero latency  Q(b – 1) bytes, for Q(b – 1) + 1 deep pipeline. Problem Examples: 1.160Gb/s line card, b =1280, Q =625: SRAM = 52Mbits 2.160Gb/s line card, b =1280, Q =625: SRAM =6.1Mbits, pipeline is 40ms.

24 Pipeline Latency, x SRAM Size Queue Length for Zero Latency Queue Length for Maximum Latency Discussion Q=1000, b = 10

25 Why it’s interesting  This is a problem faced by every linecard, network switch and network processor starting at 10Gb/s.  All commercial routers use an ad-hoc memory management algorithm with no guarantees.  We have the only (and optimal) solution that guarantees to work for all traffic patterns.

27 Recent trends DRAM Random Access Time 1.1x / 18months Moore’s Law 2x / 18 months Line Capacity 2x / 7 months User Traffic 2x / 12months

28 Packet processing gets harder time Instructions per arriving byte What we’d like: (more features) QoS, Multicast, Security, … What will happen

29 Packet processing gets harder Clock cycles per minimum length packet since 1996

31 Potted history 1. [Karol et al. 1987] Throughput limited to by head- of-line blocking for Bernoulli IID uniform traffic. 2. [Tamir 1989] Observed that with “Virtual Output Queues” (VOQs) Head-of-Line blocking is reduced and throughput goes up.

32 Potted history 3. [Anderson et al. 1993] Observed analogy to maximum size matching in a bipartite graph. 4. [M et al. 1995] (a) Maximum size match can not guarantee 100% throughput. (b) But maximum weight match can – O(N 3 ). 5. [Mekkittikul and M 1998] A carefully picked maximum size match can give 100% throughput. Matching O(N 2.5 )

33 Potted history Speedup 5. [Chuang, Goel et al. 1997] Precise emulation of a central shared memory switch is possible with a speedup of two and a “stable marriage” scheduling algorithm. 6. [Prabhakar and Dai 2000] 100% throughput possible for maximal matching with a speedup of two.

34 Potted history Newer approaches 7. [Tassiulas 1998] 100% throughput possible for simple randomized algorithm with memory. 8. [Giaccone et al. 2001] “Apsara” algorithms. 9. [Iyer and M 2000] Parallel switches can achieve 100% throughput and emulate an output queued switch. 10. [Chang et al. 2000, Keslassy et al. Sigcomm 2003] A 2- stage switch with no scheduler can give 100% throughput. 11. [Iyer, Zhang and M 2002] Distributed shared memory switches can emulate an output queued switch.

35 Basic Switch Model A 1 (n) S(n) N N L NN (n) A 1N (n) A 11 (n) L 11 (n) 11 A N (n) A NN (n) A N1 (n) D 1 (n) D N (n)

36 Some definitions of throughput

37 Scheduling algorithms to achieve 100% throughput 1. When traffic is uniform (Many algorithms…) 2. When traffic is non-uniform, but traffic matrix is known Technique: Birkhoff-von Neumann decomposition. 3. When matrix is not known. Technique: Lyapunov function. 4. When algorithm is pipelined, or information is incomplete. Technique: Lyapunov function. 5. When algorithm does not complete. Technique: Randomized algorithm. 6. When there is speedup. Technique: Fluid model. 7. When there is no algorithm. Technique: 2-stage load-balancing switch. Technique: Parallel Packet Switch.

38 Outline

39 Throughput results Theory: Practice: Input Queueing (IQ) Input Queueing (IQ) Input Queueing (IQ) Input Queueing (IQ) 58% [Karol, 1987] IQ + VOQ, Maximum weight matching IQ + VOQ, Maximum weight matching IQ + VOQ, Sub-maximal size matching e.g. PIM, iSLIP. IQ + VOQ, Sub-maximal size matching e.g. PIM, iSLIP. 100% [M et al., 1995] Different weight functions, incomplete information, pipelining. Different weight functions, incomplete information, pipelining. Randomized algorithms 100% [Tassiulas, 1998] 100% [Various] Various heuristics, distributed algorithms, and amounts of speedup Various heuristics, distributed algorithms, and amounts of speedup IQ + VOQ, Maximal size matching, Speedup of two. IQ + VOQ, Maximal size matching, Speedup of two. 100% [Dai & Prabhakar, 2000]

40 Trends in Switching  Fastest centralized scheduler with throughput guarantee: ~1Tb/s  Complexity scales O(n 2 )  Capacity grows <<2x every 18 months  Hence load-balanced switches

41 Stanford 100Tb/s Internet Router Goal: Study scalability  Challenging, but not impossible  Two orders of magnitude faster than deployed routers  We will build components to show feasibility 40Gb/s OpticalSwitch Line termination IP packet processing Packet buffering Line termination IP packet processing Packet buffering Electronic Linecard #1 Electronic Linecard #1 Electronic Linecard #625 Electronic Linecard #625 160- 320Gb/s 160Gb/s 160- 320Gb/s 100Tb/s = 640 * 160Gb/s

42 Question  Can we use an optical fabric at 100Tb/s with 100% throughput?  Conventional answer: No.  Need to reconfigure switch too often  100% throughput requires complex electronic scheduler.

43 Out R R R R/N Two-stage load-balancing switch Load-balancing stageSwitching stage In Out R R R R/N R R R 100% throughput for weakly mixing, stochastic traffic. [C.-S. Chang, Valiant]

44 Out R R R R/N In R R R R/N 3 3 1 2 3 3 3 3 3

45 Out R R R R/N In R R R R/N 3 3 1 2 3 3 3 3 3

46 Chang’s load-balanced switch Good properties 1. 100% throughput for broad class of traffic 2. No scheduler needed  Scalable

47 Chang’s load-balanced switch Bad properties FOFF: Load-balancing algorithm  Packet sequence maintained  No pathological patterns  100% throughput - always  Delay within bound of ideal FOFF: Load-balancing algorithm  Packet sequence maintained  No pathological patterns  100% throughput - always  Delay within bound of ideal 1. Packet mis-sequencing 2. Pathological traffic patterns  Throughput 1/N-th of capacity 3. Uses two switch fabrics  Hard to package 4. Doesn’t work with some linecards missing  Impractical

48 100Tb/s Load-Balanced Router L = 16 160Gb/s linecards Linecard Rack G = 40 L = 16 160Gb/s linecards Linecard Rack 1 L = 16 160Gb/s linecards 5556 12 40 x 40 MEMS Switch Rack < 100W

49 Summary of trends  Multi-rack routers  Single router POPs  No commercial router provides 100% throughput guarantee.  Address lookups  Not a problem to 160+Gb/s per linecard.  Packet buffering  Imperfect; loss of throughput above 10Gb/s.  Switching  Centralized schedulers up to about 1Tb/s  Load-balanced 2-stage switches with 100% throughput.

1 Growth in Router Capacity IPAM, Lake Arrowhead October 2003 Nick McKeown Professor of Electrical Engineering and Computer Science, Stanford University.

Similar presentations

Presentation on theme: "1 Growth in Router Capacity IPAM, Lake Arrowhead October 2003 Nick McKeown Professor of Electrical Engineering and Computer Science, Stanford University."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 Growth in Router Capacity IPAM, Lake Arrowhead October 2003 Nick McKeown Professor of Electrical Engineering and Computer Science, Stanford University.

Similar presentations

Presentation on theme: "1 Growth in Router Capacity IPAM, Lake Arrowhead October 2003 Nick McKeown Professor of Electrical Engineering and Computer Science, Stanford University."— Presentation transcript:

Similar presentations

About project

Feedback