Fast Buffer Memory with Deterministic Packet Departures Mayank Kabra, Siddhartha Saha, Bill Lin University of California, San Diego.

Slides:



Advertisements
Similar presentations
Main MemoryCS510 Computer ArchitecturesLecture Lecture 15 Main Memory.
Advertisements

1 CNPA B Nasser S. Abouzakhar Queuing Disciplines Week 8 – Lecture 2 16 th November, 2009.
The Linux Kernel: Memory Management
By Arjuna Sathiaseelan Tomasz Radzik Department of Computer Science King’s College London EPDN: Explicit Packet Drop Notification and its uses.
Design and Analysis of a Robust Pipelined Memory System Hao Wang †, Haiquan (Chuck) Zhao *, Bill Lin †, and Jun (Jim) Xu * † University of California,
Main Mem.. CSE 471 Autumn 011 Main Memory The last level in the cache – main memory hierarchy is the main memory made of DRAM chips DRAM parameters (memory.
1 Statistical Analysis of Packet Buffer Architectures Gireesh Shrimali, Isaac Keslassy, Nick McKeown
Ion Stoica, Scott Shenker, and Hui Zhang SIGCOMM’98, Vancouver, August 1998 subsequently IEEE/ACM Transactions on Networking 11(1), 2003, pp Presented.
Router Architecture : Building high-performance routers Ian Pratt
Frame-Aggregated Concurrent Matching Switch Bill Lin (University of California, San Diego) Isaac Keslassy (Technion, Israel)
Isaac Keslassy, Shang-Tse (Da) Chuang, Nick McKeown Stanford University The Load-Balanced Router.
A Scalable Switch for Service Guarantees Bill Lin (University of California, San Diego) Isaac Keslassy (Technion, Israel)
Making Parallel Packet Switches Practical Sundar Iyer, Nick McKeown Departments of Electrical Engineering & Computer Science,
Lecture 12: DRAM Basics Today: DRAM terminology and basics, energy innovations.
The Concurrent Matching Switch Architecture Bill Lin (University of California, San Diego) Isaac Keslassy (Technion, Israel)
Analysis of a Packet Switch with Memories Running Slower than the Line Rate Sundar Iyer, Amr Awadallah, Nick McKeown Departments.
1 Architectural Results in the Optical Router Project Da Chuang, Isaac Keslassy, Nick McKeown High Performance Networking Group
1 OR Project Group II: Packet Buffer Proposal Da Chuang, Isaac Keslassy, Sundar Iyer, Greg Watson, Nick McKeown, Mark Horowitz
Using Load-Balancing To Build High-Performance Routers Isaac Keslassy, Shang-Tse (Da) Chuang, Nick McKeown Stanford University.
CSIT560 by M. Hamdi 1 Course Exam: Review April 18/19 (in-Class)
Sizing Router Buffers (Summary)
Sizing Router Buffers Nick McKeown Guido Appenzeller & Isaac Keslassy SNRC Review May 27 th, 2004.
1 COMP 206: Computer Architecture and Implementation Montek Singh Wed., Nov. 13, 2002 Topic: Main Memory (DRAM) Organization.
Nick McKeown 1 Memory for High Performance Internet Routers Micron February 12 th 2003 Nick McKeown Professor of Electrical Engineering and Computer Science,
©UCB CS 162 Ch 7: Virtual Memory LECTURE 13 Instructor: L.N. Bhuyan
Analysis of a Memory Architecture for Fast Packet Buffers Sundar Iyer, Ramana Rao Kompella & Nick McKeown (sundaes,ramana, Departments.
1 IP routers with memory that runs slower than the line rate Nick McKeown Assistant Professor of Electrical Engineering and Computer Science, Stanford.
Load Balanced Birkhoff-von Neumann Switches
Merits of a Load-Balanced AAPN 1.Packets within a flow are transported to their correct destinations in sequence. This is due to the 1:1 logical connection.
Frank Casilio Computer Engineering May 15, 1997 Multithreaded Processors.
Designing Packet Buffers for Internet Routers Friday, October 23, 2015 Nick McKeown Professor of Electrical Engineering and Computer Science, Stanford.
Main Memory CS448.
Authors: Haiquan (Chuck) Zhao, Hao Wang, Bill Lin, Jun (Jim) Xu Conf. : The 5th ACM/IEEE Symposium on Architectures for Networking and Communications Systems.
Author: Sriram Ramabhadran, George Varghese Publisher: SIGMETRICS’03 Presenter: Yun-Yan Chang Date: 2010/12/29 1.
Designing Packet Buffers for Router Linecards Sundar Iyer, Ramana Kompella, Nick McKeown Reviewed by: Sarang Dharmapurikar.
Winter 2006EE384x1 EE384x: Packet Switch Architectures I Parallel Packet Buffers Nick McKeown Professor of Electrical Engineering and Computer Science,
Cisco 3 - Switching Perrine. J Page 16/4/2016 Chapter 4 Switches The performance of shared-medium Ethernet is affected by several factors: data frame broadcast.
ISLIP Switch Scheduler Ali Mohammad Zareh Bidoki April 2002.
Authors: Matteo Varvello, Diego Perino, and Leonardo Linguaglossa Publisher: NOMEN 2013 (The 2nd IEEE International Workshop on Emerging Design Choices.
Nick McKeown1 Building Fast Packet Buffers From Slow Memory CIS Roundtable May 2002 Nick McKeown Professor of Electrical Engineering and Computer Science,
Virtual Memory 1 1.
Nick McKeown Spring 2012 Lecture 2,3 Output Queueing EE384x Packet Switch Architectures.
Lecture 25 PC System Architecture PCIe Interconnect
1 Chapter Seven CACHE MEMORY AND VIRTUAL MEMORY. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4.
Interconnect Networks Basics. Generic parallel/distributed system architecture On-chip interconnects (manycore processor) Off-chip interconnects (clusters.
1 A quick tutorial on IP Router design Optics and Routing Seminar October 10 th, 2000 Nick McKeown
1 How scalable is the capacity of (electronic) IP routers? Nick McKeown Professor of Electrical Engineering and Computer Science, Stanford University
Techniques for Fast Packet Buffers Sundar Iyer, Ramana Rao, Nick McKeown (sundaes,ramana, Departments of Electrical Engineering & Computer.
Block-Based Packet Buffer with Deterministic Packet Departures Hao Wang and Bill Lin University of California, San Diego HSPR 2010, Dallas.
The Fork-Join Router Nick McKeown Assistant Professor of Electrical Engineering and Computer Science, Stanford University
1 Lecture: Memory Technology Innovations Topics: memory schedulers, refresh, state-of-the-art and upcoming changes: buffer chips, 3D stacking, non-volatile.
A Load Balanced Switch with an Arbitrary Number of Linecards I.Keslassy, S.T.Chuang, N.McKeown ( CSL, Stanford University ) Some slides adapted from authors.
Contemporary DRAM memories and optimization of their usage Nebojša Milenković and Vladimir Stanković, Faculty of Electronic Engineering, Niš.
Techniques for Fast Packet Buffers Sundar Iyer, Nick McKeown Departments of Electrical Engineering & Computer Science, Stanford.
Buffering Techniques Greg Stitt ECE Department University of Florida.
CMSC 611: Advanced Computer Architecture Memory & Virtual Memory Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material.
1 Building big router from lots of little routers Nick McKeown Assistant Professor of Electrical Engineering and Computer Science, Stanford University.
1 Lecture: Memory Basics and Innovations Topics: memory organization basics, schedulers, refresh,
Buffer Management and Arbiter in a Switch
Addressing: Router Design
Cache Memory Presentation I
Parallelism in Network Systems Joint work with Sundar Iyer
Lecture 15: DRAM Main Memory Systems
Complexity effective memory access scheduling for many-core accelerator architectures Zhang Liang.
Lecture: Memory Technology Innovations
Morgan Kaufmann Publishers Memory Hierarchy: Cache Basics
Write about the funding Sundar Iyer, Amr Awadallah, Nick McKeown
Techniques and problems for
Virtual Memory 1 1.
Techniques for Fast Packet Buffers
Presentation transcript:

Fast Buffer Memory with Deterministic Packet Departures Mayank Kabra, Siddhartha Saha, Bill Lin University of California, San Diego

IEEE Hot Interconnects XIV, August 23-25, Packet Buffer in Routers Router Core: Scheduler and Packet Buffers In Out  Incoming linecards have = 8ns to read and write a packet.  The routers need to store the packets to deal with congestion.  Bandwidth X RTT = 40Gb/s*250ms = 1Gb buffer.  Too big to store in SRAM, hence need to use DRAM.  Problem: DRAM access time ~40ns. So, there is roughly 10x speed difference. Linecards

IEEE Hot Interconnects XIV, August 23-25, Parallel and Interleaved DRAM banks DRAMs  Assume the speed difference is 3x P SRAM PP PPP

IEEE Hot Interconnects XIV, August 23-25, Problems with Parallelism  The access pattern can create problems.  If we try to access 3, 6, 9 and 11 one after another, it is possible to issue interleaved read requests and read those packets out at Line Speed. DRAMs

IEEE Hot Interconnects XIV, August 23-25, Problems with Parallelism  But, accessing 2 & 3 or 10 & 11 in succession is problematic.  This is an example of a Bank Conflict DRAMs

IEEE Hot Interconnects XIV, August 23-25, Use The Packet Departure Time  Wide classes of routers (Crossbar Routers) where the packets departures are determined by the scheduler on the fly.  Packet buffers which cater to these routers exist but are complex  There are other high performance routers such as Switch-Memory-Switch, Load Balance Routers for which packet departure time can be calculated when the packet is inserted in the buffer. Solution Idea: We will use the known departure times of the packets to schedule them to different DRAM banks such that there won’t be any conflicts. Solution Idea: We will use the known departure times of the packets to schedule them to different DRAM banks such that there won’t be any conflicts.

IEEE Hot Interconnects XIV, August 23-25, Packet Buffer Abstraction  Fixed sized packets, time is slotted (Example: 40Gb/s, 40 byte packet => 8ns).  The buffer may contain arbitrary large number of logical queues, but with deterministic access.  Single-write Single-read time-deterministic packet buffer model.

IEEE Hot Interconnects XIV, August 23-25, Packet Buffer Architecture  Interleaved memory architecture with multiple slower DRAM banks.  K slower DRAM banks.  b time slots to complete a single memory read or write operation.  b consecutive time slots is a frame.  A time slot t belongs to frame [t/b]

IEEE Hot Interconnects XIV, August 23-25, Packet Buffer Operation DRAMs... 12K-1K b packets …… aggregatede-aggregate SRAM Bypass Buffer arriving packetsdeparting packets

IEEE Hot Interconnects XIV, August 23-25, Packet Arrival [Frame 1]  Frame 1:  Assume b = 3  Packets P 1, P 2 & P 3 arrive in time slot 1, 2 and 3 respectively.  They are aggregated before writing to the DRAM. P1P1 P1P1 P2P2 P2P2 P3P3 P3P DRAMs

IEEE Hot Interconnects XIV, August 23-25, Packet Arrival [Frame 2]  Frame 2:  Packets P 1, P 2 & P 3 are being written to the DRAM banks (1, 2 & 3) during Frame 2.  New packets P 4, P 5, P 6 comes, which are stored in the buffer. P4P4 P4P4 P5P5 P5P5 P6P6 P6P DRAMs P1P1 P1P1 P2P2 P2P2 P3P3 P3P3

IEEE Hot Interconnects XIV, August 23-25, Packet Departure [Frame 19]  Packets P 58, P 59 & P 60 are scheduled to depart at time slots 58, 59 and 60 respectively (frame 20).  They will be read from the DRAM banks one frame slot before their departure frame slot (frame 19) DRAMs P 59 P 60 P 58

IEEE Hot Interconnects XIV, August 23-25, Packet Departure [Frame 20]  Packets P 58, P 59 & P 60 are read from the buffer and are output from the switch at time slot 58, 59 and 60 respectively DRAMs P 60 P 58 P 59

IEEE Hot Interconnects XIV, August 23-25, SRAM Bypass Buffer  The operational model dictates that the minimum round trip latency to write and read a packet from one of the DRAM banks is 4 frames.  Thus, a packet with a departure time less than 4b-1 time slots away cannot be stored into DRAM.  A small amount of SRAM (size 4b) is used as a bypass buffer.

IEEE Hot Interconnects XIV, August 23-25, Number of DRAM banks  Arrival Write Conflicts: P P P P P P At any current frame f, there can be at most b packets that will be written to the DRAM banks (including the current packet). Hence, for each packet, there will be maximum of b-1 “Arrival Write Conflicts” DRAMs

IEEE Hot Interconnects XIV, August 23-25, Number of DRAM banks  Arrival Read Conflicts: P P P P P P At any current frame f, there can be at most b packets that will be read from the DRAM banks. Those b banks will be busy in the current time frame and will be unavailable. Hence, for each packet, there will be maximum of b “Arrival Read Conflicts” DRAMs

IEEE Hot Interconnects XIV, August 23-25, Number of DRAM banks  Departure Read Conflicts: P P P P P P Any packet that is written in the current frame f, it will eventually need to be read in a future frame d for departure. At that future frame d, there are b-1 other departing packets. Hence, for each packet, there will be maximum of b-1 “Departure Read Conflicts” DRAMs

IEEE Hot Interconnects XIV, August 23-25, 2006 How Many DRAM Banks? P P DRAMs  Total Conflicts:  Arrival Write: (b-1)  Arrival Read: b  Departure Read: (b-1)  Hence, total (3b-2) conflicts.  If the number of banks is more than (3b-2), we will always have a free bank for all the packets.

IEEE Hot Interconnects XIV, August 23-25, DRAM Bank Selection  To find a compatible memory, maintain a two dimensional read-transaction bitmap R.  Each row corresponds to a frame slot.  Each column corresponds to a DRAM bank (hence 3b – 1 columns).  R(f, m) denotes whether m th DRAM bank has an already stored packet that must be read at the f th frame slot.

IEEE Hot Interconnects XIV, August 23-25, DRAM Bank Selection  Write-reservation bitmap W of size (3b – 1)  W(m) denotes that in current frame, m th memory bank has been assigned an arriving packet.

IEEE Hot Interconnects XIV, August 23-25, DRAM Bank Selection Logic

IEEE Hot Interconnects XIV, August 23-25, DRAM Bank Selection  Approach: Greedy solution avoiding the three types of conflicts.  To check if a memory bank is compatible for a packet p arriving at timeframe f, and having a departure timeframe d:  Check NOT(W(m) | R(f,m) | R(d, m))  Instead of checking one memory bank at a time, we can check all of them at once: V = NOT(W | R(f) | R(d)), where R(f) and R(d) are the row vectors. From V, get the index of the first compatible memory.  If n is the bank selected for p, then set W(n) = 1 and R(d,n) = 1.

IEEE Hot Interconnects XIV, August 23-25, Size of the Bitmap  Size of the packet buffer is T packets i.e., T is the farthest departure time slot relative to the current time slot.  Farthest departure frame:  Each row in the bitmap is (3b – 1) bits, then the size of the bitmap is:  Assuming a RTT of 250ms and a line rate of 40Gb/s, the packet buffer would correspond to a memory requirement of T = 3 x 10 7 packets, which makes the bitmap size close to 11MB.

IEEE Hot Interconnects XIV, August 23-25, Additional Details  Location of a packet in the DRAM:  Once a bank has been selected, need a way to assign the actual memory location to write, and later, read the packet.  Determine the memory location based on the departure frame using a circular indexing to map a frame to a packet location in the memory.  How to reorder/de-aggregate the packets?  Store the timestamp in the DRAM with the packet.

IEEE Hot Interconnects XIV, August 23-25, Conclusion  Developed a simple packet buffer architecture when the packet departure times are known e.g., Switch-Memory- Switch and Load-Balanced Routers.  Can support arbitrary large number of logical queues.  Number of DRAM banks and SRAM bypass buffer depend only on the physical parameters.

Thank You