1 Trend in the design and analysis of Internet Routers University of Pennsylvania March 17 th 2003 Nick McKeown Professor of Electrical Engineering and.

1 Trend in the design and analysis of Internet Routers University of Pennsylvania March 17 th 2003 Nick McKeown Professor of Electrical Engineering and Computer Science, Stanford University nickm@stanford.edu www.stanford.edu/~nickm

2 Outline  Context: High Performance Routers  Capacity is limited by:  System Power  Random access time of DRAMs  Evolution of Router Architecture  From ad-hoc to tractable design  Metrics  Known results  Incorporating optics into routers  The demise of routers in the core of the network

3 What a High Performance Router Looks Like Cisco GSR 12416 Juniper M160 6ft 19” 2ft Capacity: 160Gb/s Power: 4.2kW 3ft 2.5ft 19” Capacity: 80Gb/s Power: 2.6kW Capacity is sum of rates of linecards

4 Generic Router Architecture Lookup IP Address Update Header Header Processing DataHdrDataHdr ~1M prefixes Off-chip DRAM Address Table Address Table IP AddressNext Hop Queue Packet Buffer Memory Buffer Memory ~1M packets Off-chip DRAM

5 Generic Router Architecture Lookup IP Address Update Header Header Processing Address Table Address Table Lookup IP Address Update Header Header Processing Address Table Address Table Lookup IP Address Update Header Header Processing Address Table Address Table Queue Packet Buffer Memory Buffer Memory Queue Packet Buffer Memory Buffer Memory Queue Packet Buffer Memory Buffer Memory

6 Capacity Growth Exceeds Moore’s Law (only just) Growth in capacity of commercial routers:  Capacity 1992 ~ 2Gb/s  Capacity 1995 ~ 10Gb/s  Capacity 1998 ~ 40Gb/s  Capacity 2001 ~ 160Gb/s  Capacity 2003 ~ 640Gb/s Average growth rate: 2.2x / 18 months.

7 Capacity limited by power Power consumption will exceed network operator limits

8 Capacity limited by DRAM access Moore’s Law 2x / 18 months Router Capacity 2.2x / 18months Line Capacity 2x / 7 months User Traffic 2x / 12months

9 Capacity limited by DRAM access DRAM Random Access Time 1.1x / 18months Moore’s Law 2x / 18 months Router Capacity 2.2x / 18months Line Capacity 2x / 7 months User Traffic 2x / 12months

10 Route Table CPU Buffer Memory Line Interface MAC Line Interface MAC Line Interface MAC Typically <0.5Gb/s aggregate capacity First Generation Routers Shared Backplane Line Interface CPU Memory

11 Second Generation Routers Route Table CPU Line Card Buffer Memory Line Card MAC Buffer Memory Line Card MAC Buffer Memory Fwding Cache Fwding Cache Fwding Cache MAC Buffer Memory Typically <5Gb/s aggregate capacity

12 Third Generation Routers Line Card MAC Local Buffer Memory CPU Card Line Card MAC Local Buffer Memory Switched Backplane Line Interface CPU Memory Fwding Table Routing Table Fwding Table Typically <50Gb/s aggregate capacity

13 Fourth Generation Routers Switch Core Linecards Optical links 100s of metres 160Gb/s - 20Tb/s routers in development

14 Design techniques  Up until recently, routers have been designed in an ad-hoc way:  Address lookups and packet classification use non- deterministic algorithms.  Access to packet buffer rely on statistical arrivals.  Performance of switch fabrics based on simulation, not provable guarantees.  Problem: Network operators want to know what fraction of expensive long-haul links they can use.

15 Performance metrics 1. Capacity  Maximize C s.t. volume per rack < 2m 3 power per rack < 5kW 2. Guaranteed Throughput  Operators want guaranteed use of expensive long-haul links.  This would be trivial with work-conserving output-queued routers. 3. Controllable Delay  Some users would like predictable delay.  This is feasible with output-queueing plus weighted fair queueing (WFQ). WFQ

16 The Problem  Output queued switches are impractical R R R R DRAM NR data R R R R output 1 N Can’t I just use N separate memory devices per output?

17 Potted history 1. [Karol et al. 1987] Throughput limited to by head- of-line blocking, even for benign Bernoulli IID uniform traffic. 2. [Tamir 1989] Observed that with “Virtual Output Queues” (VOQs) Head-of-Line blocking is reduced and throughput goes up.

18 Potted history 3. [Anderson et al. 1993] Observed analogy to maximum size matching in a bipartite graph. 3. [M et al. 1995] (a) Maximum size match can not guarantee 100% throughput. (b) But maximum weight match can – O(N 3 ). Matching O(N 2.5 )

19 Potted history Speedup 5. [Chuang, Goel et al. 1997] Precise emulation of a central shared memory switch is possible with a speedup of two and a “stable marriage” scheduling algorithm. 6. [Prabhakar and Dai 2000] 100% throughput possible for maximal matching with a speedup of two.

20 Potted history Newer approaches 7. [Tassiulas 1998] 100% throughput possible for simple randomized algorithm with memory. 8. [Iyer and M 2000] Parallel switches can achieve 100% throughput and emulate an output queued switch. 9. [Chang et al. 2000] A 2-stage switch with a TDM scheduler can give 100% throughput. 10. [Iyer, Zhang and M 2002] Distributed shared memory switches can emulate an output queued switch.

21 Scheduling crossbar switches to achieve 100% throughput 1. Basic switch model. 2. When traffic is uniform (Many algorithms…) 3. When traffic is non-uniform, but traffic matrix is known. Technique: Birkhoff-von Neumann decomposition. 4. When matrix is not known. Technique: Lyapunov function. 5. When algorithm is pipelined, or information is incomplete. Technique: Lyapunov function. 6. When algorithm does not complete. Technique: Randomized algorithm. 7. When there is speedup. Technique: Fluid model. 8. When there is no algorithm. Technique: 2-stage load-balancing switch. Technique: Parallel Packet Switch.

22 Basic Switch Model A 1 (n) S(n) N N Q NN (n) A 1N (n) A 11 (n) Q 11 (n) 11 A N (n) A NN (n) A N1 (n) D 1 (n) D N (n)

23 Some definitions 3. Queue occupancies: Occupancy Q 11 (n) Q NN (n)

24 Some definitions of 100% throughput  Work-conserving scheduler  Definition: If there is one or more packet in the system for an output, then the output is busy.  An output queued switch is work-conserving.  Each output can be modeled as an independent single-server queue.  If  then  Therefore, we say it achieves “100% throughput”.  For fixed-sized packets, work-conservation also minimizes average packet delay. (Q: What happens when packet sizes vary?)  Non work-conserving scheduler  An input-queued switch is, in general, non work-conserving.  Q: What definitions make sense for “100% throughput”?

25 Some common definitions of 100% throughput weaker We will focus on this definition.

26 Achieving 100% throughput 1. Basic switch model. 2. When traffic is uniform (Many algorithms…) 3. When traffic is non-uniform, but traffic matrix is known Technique: Birkhoff-von Neumann decomposition. 4. When matrix is not known. Technique: Lyapunov function. 5. When algorithm is pipelined, or information is incomplete. Technique: Lyapunov function (Homework!). 6. When algorithm does not complete. Technique: Randomized algorithm. 7. When there is speedup. Technique: Fluid model. 8. When there is no algorithm. Technique: Load-balancing switches.

27 Algorithms that give 100% throughput for uniform traffic  Quite a few algorithms can give 100% throughput when traffic is uniform 1  For example:  Deterministic, and a few variants.  Wait-until-full.  Maximum size bipartite match.  Maximal size matches (e.g. PIM, iSLIP, WFA) [later]. 1. “Uniform”: the destination of each cell is picked independently and uniformly and at random (uar) from the set of all outputs.

28 Deterministic Scheduling Algorithm If arriving traffic is Bernoulli IID with destinations picked uar across outputs, and independently from time-slot to time-slot, then a round- robin schedule gives 100% throughput [almost Geo/D/1 queue]. A1 B C D 2 3 4 B C D 2 3 4 B C D 2 3 4 A1 A1 Each (i,j) pair is served every N time slots: Variations of the algorithm are possible: [Geo/Geo/1 queue] 1. Pick one of the N permutations above uar each time slot. 2. Pick uar from the set of N! permutations each time slot.   P-K for Geo/G/1Geo/D/1

29 A simple wait-until-full algorithm  We don’t have to do much at all to achieve 100% throughput when arrivals are Bernoulli IID uniform.  For example, simulation suggests that the following algorithm leads to 100% throughput. Wait-until-full: 1.If any VOQ is empty, do nothing (i.e. serve no queues). 2.If no VOQ is empty, pick a permutation uar across either (sequence of permutations, or all permutations).

30 Maximum size bipartite match  Intuition: maximize instantaneous throughput  Simulations suggest 100% throughput for uniform traffic. Q 11 (n)>0 Q N1 (n)>0 Request Graph Bipartite Match Maximum Size Match

31 Aside: Maximal Matching  A maximal matching is one in which each edge is added one at a time, and is not later removed from the matching.  i.e. no augmenting paths allowed (they remove edges added earlier).  No input and output are left unnecessarily idle.

32 Aside: Example of Maximal Size Matching A1 B C D E F 2 3 4 5 6 A1 B C D E F 2 3 4 5 6 Maximal Matching Maximum Matching

33 Some simple algorithms that achieve 100% throughput Wait until full Deterministic Maxmium Size Matching Maximal Matching Algorithm (iSLIP)

34 Non-uniform traffic  Q: Is switch stable for a maximum size match when traffic is non-uniform?  i.e. is our intuition right that we should maximize instantaneous throughput?  It turns out that our intuition is wrong, as the following counter-example shows.  The counter-example shows that i  i for at least one VOQ, and hence is unstable.

35 Counter-example Three possible matches, S(n): Consider the following non-uniform traffic pattern, with Bernoulli IID arrivals:

36 Simulation of simple 3x3 example

37 Achieving 100% throughput 1. Basic switch model. 2. When traffic is uniform (Many algorithms…) 3. When traffic is non-uniform, but traffic matrix is known Technique: Birkhoff-von Neumann decomposition. 4. When matrix is not known. Technique: Lyapunov function. 5. When algorithm is pipelined, or information is incomplete. Technique: Lyapunov function (Homework!). 6. When algorithm does not complete. Technique: Randomized algorithm. 7. When there is speedup. Technique: Fluid model. 8. When there is no algorithm. Technique: Load-balanced switches.

38 Example 1: (Trivial) scheduling to achieve 100% throughput  Assume we know the traffic matrix, the arrival pattern is deterministic, and is a permutation:  Then we can simply choose:

39 Example 2: With random arrivals, but known traffic matrix  Assume we know the traffic matrix, and it is not a permutation. For example:  Then we can choose the sequence of service permutations:  In general, if we know , can we pick a sequence S(n) so that  ? (  is OK too, if arrivals are deterministic).

40 Birkhoff-von Neumann Decomposition B-vN: Any  can be decomposed into a linear (convex) combination of permutation matrices.

41 In practice…  Unfortunately, we usually don’t know traffic matrix  a priori, so we can:  Measure or estimate , or  Not use .  In what follows, we will assume we don’t know and don’t use .

42 Achieving 100% throughput 1. Basic switch model. 2. When traffic is uniform (Many algorithms…) 3. When traffic is non-uniform, but traffic matrix is known Technique: Birkhoff-von Neumann decomposition. 4. When matrix is not known. Technique: Lyapunov function. 5. When algorithm is pipelined, or information is incomplete. Technique: Lyapunov function (Homework!). 6. When algorithm does not complete. Technique: Randomized algorithm. 7. When there is speedup. Technique: Fluid model. 8. When there is no algorithm. Technique: Load-balanced switches.

43 When the traffic matrix is not known

44 Maximum weight matching A 1 (n) N N Q NN (n) A 1N (n) A 11 (n) Q 11 (n) 11 A N (n) A NN (n) A N1 (n) D 1 (n) D N (n) Q 11 (n) Q N1 (n) “Request” Graph Bipartite Match S*(n) Maximum Weight Match

45 Choosing the weight

46 Scheduling algorithms to achieve 100% throughput 1. Basic switch model. 2. When traffic is uniform (Many algorithms…) 3. When traffic is non-uniform, but traffic matrix is known. Technique: Birkhoff-von Neumann decomposition. 4. When matrix is not known. Technique: Lyapunov function. 5. When algorithm is pipelined, or information is incomplete. Technique: Lyapunov function. 6. When algorithm does not complete. Technique: Randomized algorithm. 7. When there is speedup. Technique: Fluid model. 8. When there is no algorithm. Technique: 2-stage load-balancing switch. Technique: Parallel Packet Switch.

47 100% throughput with pipelining

48 100% throughput with incomplete information

49 Scheduling algorithms to achieve 100% throughput 1. Basic switch model. 2. When traffic is uniform (Many algorithms…) 3. When traffic is non-uniform, but traffic matrix is known. Technique: Birkhoff-von Neumann decomposition. 4. When matrix is not known. Technique: Lyapunov function. 5. When algorithm is pipelined, or information is incomplete. Technique: Lyapunov function. 6. When algorithm does not complete. Technique: Randomized algorithm. 7. When there is speedup. Technique: Fluid model. 8. When there is no algorithm. Technique: 2-stage load-balancing switch. Technique: Parallel Packet Switch.

50 Achieving 100% when algorithm does not complete Randomized algorithms: 1. Basic idea (Tassiulas) 2. Reducing delay (Shah, Giaccone and Prabhakar)

51 Scheduling algorithms to achieve 100% throughput 1. Basic switch model. 2. When traffic is uniform (Many algorithms…) 3. When traffic is non-uniform, but traffic matrix is known. Technique: Birkhoff-von Neumann decomposition. 4. When matrix is not known. Technique: Lyapunov function. 5. When algorithm is pipelined, or information is incomplete. Technique: Lyapunov function. 6. When algorithm does not complete. Technique: Randomized algorithm. 7. When there is speedup. Technique: Fluid model.

52 Speedup and Combined Input Output Queueing (CIOQ) A 1 (n) S(n) N N Q NN (n) A 1N (n) A 11 (n) Q 11 (n) 11 A N (n) A NN (n) A N1 (n) D 1 (n) D N (n) With speedup, the matching is performed s times per cell time, and up to s cells are removed from each VOQ. Therefore, output queues are required.

53 Fluid Model [Dai and Prabhakar]

54 Scheduling algorithms to achieve 100% throughput 1. Basic switch model. 2. When traffic is uniform (Many algorithms…) 3. When traffic is non-uniform, but traffic matrix is known. Technique: Birkhoff-von Neumann decomposition. 4. When matrix is not known. Technique: Lyapunov function. 5. When algorithm is pipelined, or information is incomplete. Technique: Lyapunov function. 6. When algorithm does not complete. Technique: Randomized algorithm. 7. When there is speedup. Technique: Fluid model.

55 Throughput results Theory: Practice: Input Queueing (IQ) Input Queueing (IQ) Input Queueing (IQ) Input Queueing (IQ) 58% [Karol, 1987] Different weight functions, incomplete information, pipelining. Different weight functions, incomplete information, pipelining. Randomized algorithms 100% [Tassiulas, 1998] 100% [Various] Various heuristics, distributed algorithms, and amounts of speedup Various heuristics, distributed algorithms, and amounts of speedup IQ + VOQ, Maximum weight matching IQ + VOQ, Maximum weight matching IQ + VOQ, Sub-maximal size matching e.g. PIM, i SLIP. IQ + VOQ, Sub-maximal size matching e.g. PIM, i SLIP. 100% [M et al., 1995] IQ + VOQ, Maximal size matching, Speedup of two. IQ + VOQ, Maximal size matching, Speedup of two. 100% [Dai & Prabhakar, 2000] Preventing mis-sequencing. Use of optical meshes Preventing mis-sequencing. Use of optical meshes [Stanford, 2002-] Load-balanced Single-buffered switch Load-balanced Single-buffered switch 100% [C.S. Chang, 2001]

56 Outline  Context: High Performance Routers  Capacity is limited by:  System Power  Random access time of DRAMs  Evolution of Router Architecture  From ad-hoc to tractable design  Metrics  Known results  Incorporating optics into routers  The demise of routers in the core of the network

57 2-stage switch and no scheduler Motivation: 1. If traffic is uniformly distributed, then even a deterministic schedule gives 100% throughput. 2. So why not force non-uniform traffic to be uniformly distributed?

58 2-stage switch and no scheduler S 2 (n) N N L NN (n) L 11 (n) 11 D 1 (n) D N (n) N N 11 A’ 1 (n) A’ N (n) S 1 (n) A 1 (n) A N (n) Bufferless Load-balancing Stage Buffered Switching Stage

59 2-stage switch with no scheduler

60 Scheduling algorithms to achieve 100% throughput 1. Basic switch model. 2. When traffic is uniform (Many algorithms…) 3. When traffic is non-uniform, but traffic matrix is known. Technique: Birkhoff-von Neumann decomposition. 4. When matrix is not known. Technique: Lyapunov function. 5. When algorithm is pipelined, or information is incomplete. Technique: Lyapunov function. 6. When algorithm does not complete. Technique: Randomized algorithm. 7. When there is speedup. Technique: Fluid model. 8. When there is no algorithm. Technique: 2-stage load-balancing switch.

61 Two-Stage Switch 1 N 1 N 1 NExternal Outputs Internal Inputs External Inputs Load-balancing cyclic shift Switching cyclic shift Load Balancing  First stage load-balances incoming flows  Second stage is the usual switching cyclic shift

62 2-Stage Switch External Outputs Internal Inputs 1 N External Inputs Spanning Set of Permutations 1 N 1 N

63 An example: Packet processing CPU Instructions per minimum length packet since 1996

64 Demise of packet switching at the core of the network 1. Trend: Growth in line-rates outstrips growth in ability to process and buffer packets. Consequence: We will require simpler datapaths and/or more parallelism. 2. Trend: Optical switching has huge capacity, but it is not feasible to make optical packet switches. Consequence: Optical switching will be used for circuit switches. 3. Trend: Link utilization is below 10% and falling. Operators deliberately over-provision networks. Consequence: Original goal of packet switching (efficient sharing of expensive links), no longer applies.

65 Circuit switches…  Do not process packets,  Do not buffer packets,  Consume less power (typically 75% less per Gb/s),  Fit more capacity in one rack (typically 4-8x),  Are, in practice, simpler, more reliable and more resilient,  Cost less (typically 75% less per Gb/s),  Can be built using optics,  Are already in widespread use at the core of the Internet. Prediction: Internet will evolve to become edge routers interconnected by rich mesh of WDM circuit switches.

1 Trend in the design and analysis of Internet Routers University of Pennsylvania March 17 th 2003 Nick McKeown Professor of Electrical Engineering and.

Similar presentations

Presentation on theme: "1 Trend in the design and analysis of Internet Routers University of Pennsylvania March 17 th 2003 Nick McKeown Professor of Electrical Engineering and."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 Trend in the design and analysis of Internet Routers University of Pennsylvania March 17 th 2003 Nick McKeown Professor of Electrical Engineering and.

Similar presentations

Presentation on theme: "1 Trend in the design and analysis of Internet Routers University of Pennsylvania March 17 th 2003 Nick McKeown Professor of Electrical Engineering and."— Presentation transcript:

Similar presentations

About project

Feedback