Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 IP routers with memory that runs slower than the line rate Nick McKeown Assistant Professor of Electrical Engineering and Computer Science, Stanford.

Similar presentations


Presentation on theme: "1 IP routers with memory that runs slower than the line rate Nick McKeown Assistant Professor of Electrical Engineering and Computer Science, Stanford."— Presentation transcript:

1 1 IP routers with memory that runs slower than the line rate Nick McKeown Assistant Professor of Electrical Engineering and Computer Science, Stanford University nickm@stanford.edu http://www.stanford.edu/~nickm

2 2 Outline Trends in packet switch design Additional problem: “Data rates may soon exceed memory bandwidth” The Fork-Join Router & Parallel Packet Switches

3 3 Output 2 Output N First Packet Switches Shared Memory Large, single dynamically allocated memory buffer: N writes per “cell” time N reads per “cell” time. Limited by memory bandwidth. Input 1 Output 1 Input N Input 2 Numerous work has proven and made possible: –Fairness –Delay Guarantees –Delay Variation Control –Loss Guarantees –Statistical Guarantees

4 4 Later Packet Switches Single-stage crossbar with CIOQ and VOQs 1 write per “cell” time 1 read per “cell” time Rate of writes/reads determined by switch fabric speedup Lookup & Drop Policy Output Scheduling Virtual Output Queues Output Scheduling Output Scheduling Switch Fabric Switch Arbitration Linecard Switch Core (Bufferless) Lookup & Drop Policy Lookup & Drop Policy

5 5 Myths about CIOQ-based crossbar switches 1.“Input-queued crossbars have low throughput” –An input-queued crossbar can have as high throughput as any switch. 2.“Crossbars don’t support multicast traffic well” –A crossbar inherently supports multicast efficiently. 3.“Crossbars don’t scale well” –Today, it is the number of chip I/Os, not the number of crosspoints, that limits the size of a switch fabric. Expect 5Tb/s crossbar switches.

6 6 Myths about CIOQ-based crossbar switches (2) 4. “Crossbar switches can’t support delay/QoS guarantees” –With an internal speedup of 2, a CIOQ switch can (in theory) precisely emulate a shared memory switch for all traffic.

7 7 What makes sense today?

8 8 Summary of trend Output 2 Output N Input 1 Output 1 Input N Input 2 Switch Fabric Switch Arbitration Higher Capacity Multistage: Clos Banyan Toroidal… Less frequent arbitration Limited by: Memory bandwidth ~50Gb/s Limited by: Per-cell arbitration Power ~5Tb/s 1 2

9 9 Buffer Memory How Fast Can I Make a Packet Buffer? Buffer Memory 10ns on-chip DRAM Rough Estimate: –10ns per memory operation. –Two memory operations per packet. –Therefore, maximum ~26Gb/s. 64-byte wide bus External Line e.g. OC768c Switch Fabric

10 10 How can we make routers with 40Gb/s, 160Gb/s,… interfaces?

11 11 Higher capacity and higher linerates Output 2 Output N Input 1 Output 1 Input N Input 2 Switch Fabric Switch Arbitration Multistage Less frequent arbitration Limited by: Memory bandwidth ~50Gb/s Limited by: Per-cell arbitration Power ~5Tb/s 1 2 More parallelism: Fork-Join Router 3 Higher capacity Higher Linerates

12 12 Fork-Join Router How can we: –Increase capacity. –Reduce power per subsystem. While at the same time… –Keep the system simple. –Support line rates faster than memory bandwidth. –Provide delay guarantees. Increase parallelism. Multiple racks. Single-stage buffering. Pkt-by-pkt load balancing. Hmmm….?

13 13 The Fork-Join Router 1 2 k 1 N rate, R 1 N Router Bufferless

14 14 The Fork-Join Router Advantages –Single-stage of buffering –k  power per subsystem  –k  memory bandwidth  –k  fowarding table lookup rate 

15 15 The Fork-Join Router Questions –Switching: What is the performance? –Forwarding Lookups: How do they work?

16 16 A Parallel Packet Switch 1 N rate, R 1 N Output Queued Switch Output Queued Switch Output Queued Switch 1 2 k Arriving packet tagged with egress port

17 17 Performance Questions 1.Can it be work-conserving? 2.Can it emulate a single big output queued switch? 3.Can it support delay guarantees, strict-priorities, WFQ, …?

18 18 Work Conservation rate, R 1 1 2 k 1 R/k Input Link Constraint Output Link Constraint Output Queued Switch Output Queued Switch Output Queued Switch

19 19 Work Conservation rate, R 1 1 2 k 1 R/k 1 2 3 Output Link Constraint 45 1 2 3 4 1234115

20 20 Work Conservation 1 N rate, R 1 N Output Queued Switch Output Queued Switch Output Queued Switch 1 2 k S(R/k)

21 21 Precise Emulation of an Output Queued Switch NN Output Queued Switch 1 N Parallel Packet Switch = ? 1 N 1 N

22 22 Parallel Packet Switch Theorems 1.If S > 2k/(k+2)  2 then a parallel packet switch can be work- conserving for all traffic. 2.If S > 2k/(k+2)  2 then a parallel packet switch can precisely emulate a FCFS output-queued switch for all traffic.

23 23 Parallel Packet Switch Theorems 3. If S > 3k/(k+3)  3 then a parallel packet switch can precisely emulate a switch with WFQ, strict priorities, and other types of QoS, for all traffic.

24 24 Parallel Packet Switch Theorems 4. If S >= 1 then a parallel packet switch with a small co-ordination buffer at rate R, can precisely emulate a FCFS switch for all traffic.

25 25 Co-ordination buffers rate, R Output Queued Switch Output Queued Switch Output Queued Switch 1 2 k R/k Size Nk

26 26 Parallel Packet Switch Theorems 5. If S > 2 then a parallel packet switch with a small co-ordination buffer at rate R, can precisely emulate a switch with WFQ, strict priorities, and other types of QoS, for all traffic.

27 27 The Fork-Join Router Questions –Switching: What is the performance? –Forwarding Lookups: How do they work?

28 28 The Fork-Join Router Lookahead Forwarding Table Lookups Packet tagged with egress port at next router Lookup performed in parallel at rate R/k

29 29 The Fork-Join Router 1 2 k 1 N rate, R 1 N Router Possibly >100Tb/s aggregate capacity Linerates in excess of 100Gb/s


Download ppt "1 IP routers with memory that runs slower than the line rate Nick McKeown Assistant Professor of Electrical Engineering and Computer Science, Stanford."

Similar presentations


Ads by Google