Gilbert Hendry Johnnie Chan, Daniel Brunina,

Slides:



Advertisements
Similar presentations
DRAM background Fully-Buffered DIMM Memory Architectures: Understanding Mechanisms, Overheads and Scaling, Garnesh, HPCA'07 CS 8501, Mario D. Marino, 02/08.
Advertisements

COEN 180 DRAM. Dynamic Random Access Memory Dynamic: Periodically refresh information in a bit cell. Else it is lost. Small footprint: transistor + capacitor.
Main Mem.. CSE 471 Autumn 011 Main Memory The last level in the cache – main memory hierarchy is the main memory made of DRAM chips DRAM parameters (memory.
1 Lecture 14: Cache Innovations and DRAM Today: cache access basics and innovations, DRAM (Sections )
1 Lecture 15: DRAM Design Today: DRAM basics, DRAM innovations (Section 5.3)
IP I/O Memory Hard Disk Single Core IP I/O Memory Hard Disk IP Bus Multi-Core IP R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R Networks.
Memory Hierarchy.1 Review: Major Components of a Computer Processor Control Datapath Memory Devices Input Output.
IT Systems Memory EN230-1 Justin Champion C208 –
Performance and Power Efficient On-Chip Communication Using Adaptive Virtual Point-to-Point Connections M. Modarressi, H. Sarbazi-Azad, and A. Tavakkol.
COLUMBIA UNIVERSITY Interconnects Jim Tomkins: “Exascale System Interconnect Requirements” Jeff Vetter: “IAA Interconnect Workshop Recap and HPC Application.
Memory Technology “Non-so-random” Access Technology:
Charles Kime & Thomas Kaminski © 2008 Pearson Education, Inc. (Hyperlinks are active in View Show mode) Chapter 8 – Memory Basics Logic and Computer Design.
Photonic Networks on Chip Yiğit Kültür CMPE 511 – Computer Architecture Term Paper Presentation 27/11/2008.
ROBERT HENDRY, GILBERT HENDRY, KEREN BERGMAN LIGHTWAVE RESEARCH LAB COLUMBIA UNIVERSITY HPEC 2011 TDM Photonic Network using Deposited Materials.
CSIE30300 Computer Architecture Unit 07: Main Memory Hsin-Chou Chi [Adapted from material by and
Survey of Existing Memory Devices Renee Gayle M. Chua.
Déjà Vu Switching for Multiplane NoCs NOCS’12 University of Pittsburgh Ahmed Abousamra Rami MelhemAlex Jones.
SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | SCHOOL OF COMPUTER SCIENCE | GEORGIA INSTITUTE OF TECHNOLOGY MANIFOLD Multi-core Systems and Coherence.
Memory Systems Embedded Systems Design and Implementation Witawas Srisa-an.
Buffer-On-Board Memory System 1 Name: Aurangozeb ISCA 2012.
Charles Kime & Thomas Kaminski © 2004 Pearson Education, Inc. Terms of Use (Hyperlinks are active in View Show mode) Terms of Use ECE/CS 352: Digital Systems.
CPEN Digital System Design
Silicon Nanophotonic Network-On-Chip Using TDM Arbitration
1 Lecture 14: DRAM Main Memory Systems Today: cache/TLB wrap-up, DRAM basics (Section 2.3)
Dynamic Random Access Memory (DRAM) CS 350 Computer Organization Spring 2004 Aaron Bowman Scott Jones Darrell Hall.
Evaluating STT-RAM as an Energy-Efficient Main Memory Alternative
HPEC 2007, Lexington, MA18-20 September, 2007 On-Chip Photonic Communications for High Performance Multi-Core Processors Keren Bergman, Luca Carloni, Columbia.
Assaf Shacham, Keren Bergman, Luca P. Carloni Presented for HPCAN Session by: Millad Ghane NOCS’07.
1 Adapted from UC Berkeley CS252 S01 Lecture 18: Reducing Cache Hit Time and Main Memory Design Virtucal Cache, pipelined cache, cache summary, main memory.
D ESIGN AND I MPLEMENTAION OF A DDR SDRAM C ONTROLLER FOR S YSTEM ON C HIP Magnus Själander.
1 Memory Hierarchy (I). 2 Outline Random-Access Memory (RAM) Nonvolatile Memory Disk Storage Suggested Reading: 6.1.
Contemporary DRAM memories and optimization of their usage Nebojša Milenković and Vladimir Stanković, Faculty of Electronic Engineering, Niš.
© Wen-mei Hwu and S. J. Patel, 2005 ECE 511, University of Illinois Lecture 5-6: Memory System.
Building manycore processor-to-DRAM networks using monolithic silicon photonics Ajay Joshi †, Christopher Batten †, Vladimir Stojanović †, Krste Asanović.
1 Lecture: DRAM Main Memory Topics: DRAM intro and basics (Section 2.3)
1 Lecture 16: Main Memory Innovations Today: DRAM basics, innovations, trends HW5 due on Thursday; simulations can take a few hours Midterm: 32 scores.
1 Lecture: Memory Basics and Innovations Topics: memory organization basics, schedulers, refresh,
Washington University
CS 704 Advanced Computer Architecture
Chapter 5 Internal Memory
ESE532: System-on-a-Chip Architecture
ECE 4100/6100 Advanced Computer Architecture Lecture 11 DRAM
Reducing Memory Interference in Multicore Systems
CSE 502: Computer Architecture
Topics SRAM-based FPGA fabrics: Xilinx. Altera..
SRAM Memory External Interface
Dynamic Random Access Memory (DRAM) Basic
SWITCHING Switched Network Circuit-Switched Network Datagram Networks
RAM Chapter 5.
Lecture 15: DRAM Main Memory Systems
The Role of Light in High Speed Digital Design
William Stallings Computer Organization and Architecture 8th Edition
Hakim Weatherspoon CS 3410 Computer Science Cornell University
Analysis of a Chip Multiprocessor Using Scientific Applications
Lecture: Memory, Multiprocessors
The Main Memory system: DRAM organization
Lecture: DRAM Main Memory
Lecture: DRAM Main Memory
Lecture: DRAM Main Memory
William Stallings Computer Organization and Architecture 8th Edition
DRAM Bandwidth Slide credit: Slides adapted from
Lecture: Memory Technology Innovations
15-740/ Computer Architecture Lecture 19: Main Memory
DRAM Hwansoo Han.
William Stallings Computer Organization and Architecture 8th Edition
Bob Reese Micro II ECE, MSU
Chapter 13: I/O Systems.
Computer Architecture Lecture 30: In-memory Processing
Multiprocessors and Multi-computers
Presentation transcript:

Photonic On-Chip Networks for Performance-Energy Optimized Off-Chip Memory Access Gilbert Hendry Johnnie Chan, Daniel Brunina, Luca Carloni, Keren Bergman Lightwave Research Laboratory Columbia University New York, NY

Motivation The memory gap warrants a paradigm shift in how we move information to and from storage and computing elements [www.OpenSparc.net] [Exascale Report, 2008] Lightwave Research Lab, Columbia University 9/17/2018

Main Premise Current memory subsystem technology and packaging are not well-suited to future trends Networks on chip Growing cache sizes Growing bandwidth requirements Growing pin counts Lightwave Research Lab, Columbia University 9/17/2018

SDRAM context DIMMs controlled fully in parallel, sharing access on data and address busses Many wires/pins Matched signal paths (for delay) DIMMs made for short, random accesses Chip Lately, this is on chip [Intel] DIMM Memory Controller DIMM DIMM Lightwave Research Lab, Columbia University 9/17/2018

Future SDRAM context Example: Tilera TILE 64 9/17/2018 Lightwave Research Lab, Columbia University 9/17/2018

SDRAM DIMM Anatomy Cntrl IO 9/17/2018 DRAM_Bank DRAM_Chip Row Decoder data DRAM_Bank Col Decoder Sense Amps Col addr/en DRAM_Chip data DRAM cell arrays IO Cntrl Row addr/en Row Decoder Banks (usually 8) Addr/cntrl Ranks SDRAM device DRAM_DIMM Lightwave Research Lab, Columbia University 9/17/2018

Memory Access in an Electronic NoC Packetized, size of packet determined by router buffers message Chip Boundary NoC router Memory Controller End result: burst length in DRAM is dictated by packet size. (limited burst length). Results in short, random accesses. Burst length dictated by packet size Lightwave Research Lab, Columbia University 9/17/2018

Memory Control Complex DRAM control Scheduling accesses around: Open/closed rows Precharging Refreshing Data/Control bus usage [DRAMsim, UMD] Lightwave Research Lab, Columbia University 9/17/2018

Experimental Setup – Electronic NoC System: 5-port Electronic Router 2cm×2cm chip 8×8 Electronic Mesh 28 DRAM Access points (MCs) 2 DIMMs per DRAM AP Routers: 1 kb input buffers (per VC) 4 virtual channels 256 b packet size 128 b channels 32 nm tech. point (ORION) Normal Vt Vdd = 1.0 V Freq = 2.5 GHz Introduce: hypothetical CMP in 32nm. Explain we have a simulator. Traffic: DRAM: Random core-DRAM access point pairs Random read/write Uniform message sizes Poisson arrival at 1µs Modeled cycle-accurately with DRAMsim [Univ. MD] DDR3 (10-10-10) @ 1333 MT/s 8 chips per DIMM, 8 banks per Chip, 2 ranks Lightwave Research Lab, Columbia University 9/17/2018

Experiment Results 269 Gb/s 9/17/2018 Make sure to mention x-axis. Lightwave Research Lab, Columbia University 9/17/2018

Current Lightwave Research Lab, Columbia University 9/17/2018

Goal: Optically Integrated Memory Optical Fiber Optical Transceiver Vdd, Gnd Lightwave Research Lab, Columbia University 9/17/2018

Advantages of Photonics Decoupled energy-distance relationship No long traces to drive and synch with clock DRAM chips can run faster Less power Less pins on DIMM module and going into chip Eventually required by packaging constraints Waveguides can achieve dramatically higher density due to WDM DRAM can be arbitrarily distant – fiber is low loss Lightwave Research Lab, Columbia University 9/17/2018

Hybrid Circuit-Switched Photonic Network Broadband 1×2 Switch [Cornell, 2008] Broadband 2×2 Switch  Transmission [Shacham, NOCS ’07] Lightwave Research Lab, Columbia University 9/17/2018

Hybrid Circuit-Switched Photonic Network Photonic Transmission Electronic Control Compute Lightwave Research Lab, Columbia University 9/17/2018

Hybrid Circuit-Switched Photonic Network International Symposium on Networks-on-Chip 9/17/2018

Hybrid Circuit-Switched Photonic Network Goal: to establish these circuit paths from anywhere on the network to any DRAM access point [Bergman, HPEC ’07] International Symposium on Networks-on-Chip 9/17/2018

Photonic DRAM Access DIMM DIMM DIMM 9/17/2018 Memory gateway Fiber / PCB waveguide Memory gateway DIMM DIMM Photonic + electronic DIMM Procesor gateway To network Modulators needed to send commands to DRAM Processor / cache electronic Chi p boundary Photonic switch Modulators Practice these. generates memory control commands cntrl Memory Control Network Interface Mem cntrl To/From network Lightwave Research Lab, Columbia University 9/17/2018

Memory Transaction DIMM DIMM DIMM Memory gateway DIMM 3 DIMM DIMM Procesor gateway To network 2 1 Read or write request is initiated from local or remote processor, travels on electronic network Processor Gateway forwards it to Memory gateway Memory gateway receives request 1 Processor / cache Chi p boundary Lightwave Research Lab, Columbia University 9/17/2018

Memory READ Transaction 4) MC receives READ command 5) Switch is setup from modulators to DIMM, and from DIMM to network 6) Path setup travels back to receiving Processor. Path ACK returns when path is set up 7) Row/Col addresses sent to DIMM optically 8) Read data returned optically 9) Path torn down, MC knows how long it will take 8 Photonic switch 7 Modulators Commands can be sent and data returned in the same switch setup 5 Control 4 8 6 Lightwave Research Lab, Columbia University 9/17/2018

Memory WRITE Transaction 4) MC receives WRITE command, which is also a path setup from the processor to memory gateway 5) Switch is setup from modulators to DIMM 6) Row/Col addresses sent to DIMM 7) Switch is setup from network to DIMM 8) Path ACK sent back to Processor 9) Data transmitted optically to DIMM 10) Path torn down from Processor after data transmitted 9 Photonic switch 6 Modulators Controller must alternate DRAM commands with incoming write data 5 7 Control 4 8 Lightwave Research Lab, Columbia University 9/17/2018

Optical Circuit Memory (OCM) Anatomy Packe t Format λ Detector Bank DRAM_OpticalTransceiver Cntrl Data Bank ID Burst length DLL Latches Mux Addr/cntrl (25) Modulator Bank Nλ Nλ Data (64) Row address Col address drivers clk t tRCD tCL Fiber Coupling All chips accessed in parallel. OR Waveguide Coupling VDD, Gnd Lightwave Research Lab, Columbia University 9/17/2018

Advantages of Photonics Decoupled energy-distance relationship No long traces to drive and synch with clock DRAM chips can run faster Less power Less pins on DIMM module and going into chip Eventually required by packaging constraints Waveguides can achieve dramatically higher density due to WDM DRAM can be arbitrarily distant – fiber is low loss Simplified memory control logic – no contending accesses, contention handled by path setup Accesses are optimized for large streams of data Lightwave Research Lab, Columbia University 9/17/2018

Experimental Setup - Photonic System: Photonic Torus Tile 2cm×2cm chip 8×8 Photonic Torus 28 DRAM Access points (MCs) 2 DIMMs per DRAM AP Routers: 256 b buffers 32 b packet size 32 b channels 32 nm tech. point (ORION) High Vt Vdd = 0.8 V Freq = 1 GHz Photonics - 13λ Traffic: DRAM: Random core-DRAM access point pairs Random read/write Uniform message sizes Poisson arrival at 1µs Modeled with our event-driven DRAM model DDR3 (10-10-10) @ 1600 MT/s 8 chips per DIMM, 8 banks per Chip Lightwave Research Lab, Columbia University 9/17/2018

Performance Comparison Lightwave Research Lab, Columbia University 9/17/2018

Experiment #2 Random Statically Mapped Address Space 9/17/2018 Lightwave Research Lab, Columbia University 9/17/2018

Results Lightwave Research Lab, Columbia University 9/17/2018

Network Energy Comparison Electronic Mesh Photonic Torus This only network power (no DRAM power, MC). Laser pwr more clear Power = 0.42 W Power = 13.3 W Total Power = 2.53 W (Including laser power) Lightwave Research Lab, Columbia University 9/17/2018

Summary Extending a photonic network to include access to DRAM looks good for many reasons: Circuit-switching allows large burst lengths and simplified memory control, for increased bandwidth. Energy efficient end-to-end transmission Alleviates pin count constraints with high-density waveguides PhotoMAN Lightwave Research Lab, Columbia University 9/17/2018