Download presentation
Presentation is loading. Please wait.
1
Nick McKeown 1 Memory for High Performance Internet Routers Micron February 12 th 2003 Nick McKeown Professor of Electrical Engineering and Computer Science, Stanford University nickm@stanford.edu www.stanford.edu/~nickm
2
Nick McKeown 2 Ways to get involved 1. Weekly group meetings, talks and papers: http://klamath.stanford.edu 2. Optics in Routers Project http://klamath.stanford.edu/or 3. Networking classes at Stanford: Introduction to Computer Networks: EE284, CS244a, EE384a. Packet Switch Architectures: EE384x, EE384y. Multimedia Networking: EE384b,c 4. Stanford Network Seminar Series: http://netseminar.stanford.edu 5. Stanford Networking Research Center: http://snrc.stanford.edu
3
Nick McKeown 3 Outline Context: High Performance Routers Trends and Consequences Fast Packet Buffers
4
Nick McKeown 4 What a High Performance Router Looks Like Cisco GSR 12416 Juniper M160 6ft 19” 2ft Capacity: 160Gb/s Power: 4.2kW 3ft 2.5ft 19” Capacity: 80Gb/s Power: 2.6kW
5
Nick McKeown 5 Points of Presence (POPs) A B C POP1 POP3 POP2 POP4 D E F POP5 POP6 POP7 POP8
6
Nick McKeown 6 Generic Router Architecture Lookup IP Address Update Header Header Processing DataHdrDataHdr ~1M prefixes Off-chip DRAM Address Table Address Table IP AddressNext Hop Queue Packet Buffer Memory Buffer Memory ~1M packets Off-chip DRAM
7
Nick McKeown 7 Generic Router Architecture Lookup IP Address Update Header Header Processing Address Table Address Table Lookup IP Address Update Header Header Processing Address Table Address Table Lookup IP Address Update Header Header Processing Address Table Address Table Buffer Manager Buffer Memory Buffer Memory Buffer Manager Buffer Memory Buffer Memory Buffer Manager Buffer Memory Buffer Memory
8
Nick McKeown 8 Outline Context: High Performance Routers Trends and Consequences Routing Tables Network Processors Circuit Switches Bigger Routers Multi-rack Routers Packet Buffers Fast Packet Buffers
9
Nick McKeown 9 Trends in Routing Tables 1 2 3 IPv6 adoption is extremely slow Source: Geoff Huston Consequences: 1.Whole of IPv4 Address space will fit on one 4Gb DRAM. 2.Whole of IPv4 Address space as table on one 32Gb DRAM. 3.All 24-bit prefixes as a lookup table fit in 10% of a 1Gb DRAM. 4.1M entry address table already fits in the corner of an ASIC. 5.TCAMs (for IP lookup) have limited life. Moore’s Law 2x / 18 months 99.5% prefixes are 24-bits or shorter
10
Nick McKeown 10 Trends in Technology, Routers & Traffic DRAM Random Access Time 1.1x / 18months Moore’s Law 2x / 18 months Router Capacity 2.2x / 18months Line Capacity 2x / 7 months User Traffic 2x / 12months
11
Nick McKeown 11 Trends and Consequences CPU Instructions per minimum length packet 1 Consequences: 1.Packet processing is getting harder, and eventually network processors will be used less for high performance routers. 2.(Much) bigger routers will be developed. 5-fold disparity traffic Router capacity Disparity between traffic and router growth 2
12
Nick McKeown 12 Trends and Consequences (2) Power consumption will Exceed POP limits 3 Disparity between line-rate and memory access time 4 Consequences: 3.Multi-rack routers will spread power over multiple racks. 4.It will get harder to build packet buffers for linecards…
13
Nick McKeown 13 Outline Context: High Performance Routers Trends and Consequences Fast Packet Buffers Work with Sundar Iyer (PhD Student) Problem of big, fast memories Hybrid SRAM-DRAM How big does the SRAM need to be? Prototyping
14
Nick McKeown 14 The Problem All packet switches (e.g. Internet routers, ATM switches) require packet buffers for periods of congestion. Size: For TCP to work well, the buffers need to hold one RTT (about 0.25s) of data. Speed: Clearly, the buffer needs to store (retrieve) packets as fast as they arrive (depart). Memory Linerate, R Memory Linerate, R Memory 1 N 1 N
15
Nick McKeown 15 An Example Packet buffers for a 40Gb/s router linecard Buffer Memory Write Rate, R One 40B packet every 8ns Read Rate, R One 40B packet every 8ns 10Gbits Buffer Manager
16
Nick McKeown 16 Memory Technology Use SRAM? + Fast enough random access time, but - Too low density to store 10Gbits of data. Use DRAM? + High density means we can store data, but - Can’t meet random access time.
17
Nick McKeown 17 Can’t we just use lots of DRAMs in parallel? Buffer Memory Write Rate, R One 40B packet every 8ns Read Rate, R One 40B packet every 8ns Buffer Manager Buffer Memory Buffer Memory Buffer Memory Buffer Memory Buffer Memory Buffer Memory Buffer Memory Read/write 320B every 32ns 40-79Bytes: 0-39……………280-319 320B
18
Nick McKeown 18 Works fine if there is only one FIFO Write Rate, R One 40B packet every 8ns Read Rate, R One 40B packet every 8ns Buffer Manager 40-79Bytes: 0-39……………280-319 320B Buffer Memory 320B 40B 320B 40B 320B
19
Nick McKeown 19 Works fine if there is only one FIFO Write Rate, R One 40B packet every 8ns Read Rate, R One 40B packet every 8ns Buffer Manager 40-79Bytes: 0-39……………280-319 320B Buffer Memory 320B ?B 320B ?B 320B Variable Length Packets
20
Nick McKeown 20 In practice, buffer holds many FIFOs 40-79Bytes: 0-39……………280-319 320B 1 2 Q e.g. In an IP Router, Q might be 200. In an ATM switch, Q might be 10 6. Write Rate, R One 40B packet every 8ns Read Rate, R One 40B packet every 8ns Buffer Manager 320B ?B 320B ?B How can we write multiple variable-length packets into different queues?
21
Nick McKeown 21 Problems 1. A 320B block will contain packets for different queues, which can’t be written to, or read from the same location. 2. If instead a different address is used for each memory, and packets in the 320B block are written to different locations, how do we know the memory will be available for reading when we need to retrieve the packet?
22
Nick McKeown 22 Arriving Packets R Arbiter or Scheduler Requests Departing Packets R 12 1 Q 2 1 2 34 34 5 123456 Small head SRAM cache for FIFO heads SRAM Hybrid Memory Hierarchy Large DRAM memory holds the body of FIFOs 5768109 798 11 12141315 5052515354 868887899190 8284838586 929493 95 68791110 1 Q 2 Writing b bytes Reading b bytes cache for FIFO tails 55 56 9697 87 88 57585960 899091 1 Q 2 Small tail SRAM DRAM
23
Nick McKeown 23 Some Thoughts The buffer architecture itself is well known. Usually designed to work OK on average. We would like deterministic guarantees. 1. What is the minimum SRAM needed to guarantee that a byte is always available in SRAM when requested? 2. What algorithm should we use to manage the replenishment of the SRAM “cache” memory?
24
Nick McKeown 24 An Example Q = 5, w = 9+, b = 6 t = 1 Bytes t = 3 Bytes t = 4 Bytes t = 5 Bytes t = 7 Bytes t = 2 Bytes t = 6 Bytes t = 0 Bytes Replenish
25
Nick McKeown 25 An Example Q = 5, w = 9+, b = 6 t = 8 Bytes t = 9 Bytes … t = 10 Bytes t = 11 Bytes t = 12 Bytes t = 13 Bytes Replenish … t = 19 Bytes Replenish t = 23 Bytes Read
26
Nick McKeown 26 Theorem Impatient Arbiter: An SRAM cache of size Qb(2 + ln Q) bytes is sufficient to guarantee a byte is always available when requested. Algorithm is called MDQF (Most Deficit Queue first). Examples: 1.40Gb/s linecard, b =640, Q =128: SRAM = 560kBytes 2.160Gb/s linecard, b =2560, Q =512: SRAM = 10MBytes
27
Nick McKeown 27 Reducing the size of the SRAM Intuition: If we use a lookahead buffer to peek at the requests “in advance”, we can replenish the SRAM cache only when needed. This increases the latency from when a request is made until the byte is available. But because it is a pipeline, the issue rate is the same.
28
Nick McKeown 28 Theorem Patient Arbiter: An SRAM cache of size Q(b – 1) bytes is sufficient to guarantee that a requested byte is available within Q(b – 1) + 1 request times. Algorithm is called ECQF (Earliest Critical Queue first). Example: 160Gb/s linecard, b =2560, Q =512: SRAM = 1.3MBytes, delay bound is 65 s (equivalent to 13 miles of fiber).
29
Nick McKeown 29 Maximum Deficit Queue First with Latency (MDQFL) What if application can only tolerate a latency l max < Q(b – 1) + 1 timeslots? Algorithm: Maximum Deficit Queue First with latency (MDQFL) services a queue, once every b timeslots in the following order: 1. If there is an earliest critical queue, replenish it. 2. If not, then replenish the queue that will have the most deficit l max timeslots in the future.
30
Nick McKeown Latency Queue Length, w Queue Length for Zero Latency (MDF) Queue Length for Maximum Latency (ECQF) Queue Length vs. Latency Q=1000, b = 10
31
Nick McKeown 31 What’s Next We plan to prototype a 160Gb/s linecard buffer. Part of Optics in Routers Project at Stanford: http://klamath.stanford.edu/or Funding: Cisco, MARCO (US Government-Industry consortium), TI. Would Micron like to work with us?
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.