Reducing Power and Area by Interconnecting Memory Controllers to Memory Ranks with RF Coplanar Waveguides on the Same Package WEED 2011, ISCA Mario D.

Slides:



Advertisements
Similar presentations
George Nychis✝, Chris Fallin✝, Thomas Moscibroda★, Onur Mutlu✝
Advertisements

Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai.
Pricing for Utility-driven Resource Management and Allocation in Clusters Chee Shin Yeo and Rajkumar Buyya Grid Computing and Distributed Systems (GRIDS)
Using Partial Tag Comparison in Low-Power Snoop-based Chip Multiprocessors Ali ShafieeNarges Shahidi Amirali Baniasadi Sharif University of Technology.
Dynamic Power Redistribution in Failure-Prone CMPs Paula Petrica, Jonathan A. Winter * and David H. Albonesi Cornell University *Google, Inc.
ITRS Winter Conference 2006 The Ambassador Hotel Hsin Chu Taiwan 1 International Technology Roadmap for Semiconductors Assembly and Packaging 2006.
Some Recent Topics in Physical-Layer System Standards Felix Kapron Standards Engineering Felix Kapron Standards Engineering.
0 - 0.
Addition Facts
Semiconductor Optical Amplifiers in Avionics C Michie, W Johnstone, I Andonovic, E Murphy, H White, A Kelly.
High Performance Polarisation Independent RSOAs in the S,C and L Bands S.Karagiannopoulos, A. E. Kelly, C. Michie, C. Tombling, W. I. Madden, I. Andonovic.
Beyond ad – Ultra High Capacity and Throughput WLAN
Prefetch-Aware Shared-Resource Management for Multi-Core Systems Eiman Ebrahimi * Chang Joo Lee * + Onur Mutlu Yale N. Patt * * HPS Research Group The.
Gennady Pekhimenko Advisers: Todd C. Mowry & Onur Mutlu
Galaxy: High-Performance Energy-Efficient Multi-Chip Architectures Using Photonic Interconnects Nikos Hardavellas – Parallel Architecture Group.
Predicting Performance Impact of DVFS for Realistic Memory Systems Rustam Miftakhutdinov Eiman Ebrahimi Yale N. Patt.
6-k 43-Gb/s Differential Transimpedance-Limiting Amplifiers with Auto-zero Feedback and High Dynamic Range H. Tran 1, F. Pera 2, D.S. McPherson 1, D. Viorel.
JAZiO Incorporated 1 JAZiO JAZiO Incorporated Incorporatedwww.JAZiO.com Digital Signal Switching Technology.
Zehan Cui, Yan Zhu, Yungang Bao, Mingyu Chen Institute of Computing Technology, Chinese Academy of Sciences July 28, 2011.
Best of Both Worlds: A Bus-Enhanced Network on-Chip (BENoC) Ran Manevich, Isask har (Zigi) Walter, Israel Cidon, and Avinoam Kolodny Technion – Israel.
1 Networks for Multi-core Chip A Controversial View Shekhar Borkar Intel Corp.
The Bus Architecture of Embedded System ESE 566 Report 1 LeTian Gu.
Electrical and Computer Engineering UAH System Level Optical Interconnect Optical Fiber Computer Interconnect: The Simultaneous Multiprocessor Exchange.
Buses are strips of parallel wires or printed circuits used to transmit electronic signals on the systemboard to other devices. Most Pentium systems use.
GPUs and Future of Parallel Computing Authors: Stephen W. Keckler et al. in NVIDIA IEEE Micro, 2011 Taewoo Lee
DRAM background Fully-Buffered DIMM Memory Architectures: Understanding Mechanisms, Overheads and Scaling, Garnesh, HPCA'07 CS 8501, Mario D. Marino, 02/08.
Dawei Huang, IEEE Journal of Selected Topics in Quantum Electronics, March/April 2003 Optical Interconnects: Out of the Box Forever? Jeong-Min Lee
OS-aware Tuning Improving Instruction Cache Energy Efficiency on System Workloads Authors : Tao Li, John, L.K. Published in : Performance, Computing, and.
PML Semiconductor Electronics Division, NIST
CMPE 150- Introduction to Computer Networks 1 CMPE 150 Fall 2005 Lecture 6 Introduction to Networks and the Internet.
Feb. 17, 2011 Midterm overview Real life examples of built chips
ASYNC07 High Rate Wave-pipelined Asynchronous On-chip Bit-serial Data Link R. Dobkin, T. Liran, Y. Perelman, A. Kolodny, R. Ginosar Technion – Israel Institute.
Javier Lira (UPC, Spain)Carlos Molina (URV, Spain) David Brooks (Harvard, USA)Antonio González (Intel-UPC,
Bypass and Insertion Algorithms for Exclusive Last-level Caches
Computer Maintenance Unit Subtitle: CPUs Copyright © Texas Education Agency, All rights reserved.1.
College of Engineering Capacity Allocation in Multi-cell UMTS Networks for Different Spreading Factors with Perfect and Imperfect Power Control Robert.
Virtual Switching Without a Hypervisor for a More Secure Cloud Xin Jin Princeton University Joint work with Eric Keller(UPenn) and Jennifer Rexford(Princeton)
1 TSD-160 Introduction to Network Analyzers and Error Correction Doug Rytting 4804 Westminster Place Santa Rosa, CA
10/10/ * Introduction * Network Evolution * Why Gi-Fi is used * Bluetooth & Wi-Fi * Architecture of Gi-Fi * Features / Advantages * Applications.
Submission doc.: IEEE /1409r0 November 2013 Adriana Flores, Rice UniversitySlide 1 Dual Wi-Fi: Dual Channel Wi-Fi for Congested WLANs with Asymmetric.
Addition 1’s to 20.
Week 1.
Number bonds to 10,
QR026 High Sensitivity VME Tuner Performance Data
Application-to-Core Mapping Policies to Reduce Memory System Interference Reetuparna Das * Rachata Ausavarungnirun $ Onur Mutlu $ Akhilesh Kumar § Mani.
EE105 Fall 2007Lecture 13, Slide 1Prof. Liu, UC Berkeley Lecture 13 OUTLINE Cascode Stage: final comments Frequency Response – General considerations –
// RF Transceiver Design Condensed course for 3TU students Peter Baltus Eindhoven University of Technology Department of Electrical Engineering
QuT: A Low-Power Optical Network-on-chip
1/42 Changkun Park Title Dual mode RF CMOS Power Amplifier with transformer for polar transmitters March. 26, 2007 Changkun Park Wave Embedded Integrated.
16/06/20151 Wireless Network on a Chip Joseph Thomas Special Topics in SoC.
Low Power Design for Wireless Sensor Networks Aki Happonen.
© intec 2000 Reasons for parallel optical interconnects Roel Baets Ghent University - IMEC Department of Information Technology (INTEC)
Integrated  -Wireless Communication Platform Jason Hill.
Microwave Interference Effects on Device,
A 10 Gb/s Photonic Modulator and WDM MUX/DEMUX Integrated with Electronics in 0.13um SOI CMOS High Speed Circuits & Systems Laboratory Joungwook Moon 2011.
RF Wakeup Sensor – On-Demand Wakeup for Zero Idle Listening and Zero Sleep Delay.
Computer performance.
TLC: Transmission Line Caches Brad Beckmann David Wood Multifacet Project University of Wisconsin-Madison 12/3/03.
Outline Direct conversion architecture Time-varying DC offsets Solutions on offset Harmonic mixing principle FLEX pager receiver Individual receiver blocks.
Comparing Memory Systems for Chip Multiprocessors Leverich et al. Computer Systems Laboratory at Stanford Presentation by Sarah Bird.
Presenter: Chun-Han Hou ( 侯 鈞 瀚)
Interconnect Focus Center e¯e¯ e¯e¯ e¯e¯ e¯e¯ IWSM 2001Sam, Chandrakasan, and Boning – MIT Variation Issues in On-Chip Optical Clock Distribution S. L.
Interconnect Technologies and Drivers primary technologies: integration + optics driven primarily by servers/cloud computing thin wires → slow wires; limits.
Surfliner: Distortion-less Electrical Signaling for Speed of Light On- chip Communication Hongyu Chen, Rui Shi, Chung-Kuan Cheng Computer Science and Engineering.
High Gain Transimpedance Amplifier with Current Mirror Load By: Mohamed Atef Electrical Engineering Department Assiut University Assiut, Egypt.
Architecture & Organization 1
The Role of Light in High Speed Digital Design
Architecture & Organization 1
Leveraging Optical Technology in Future Bus-based Chip Multiprocessors
Multiport, Multichannel Transmission Line: Modeling and Synthesis
Presentation transcript:

Reducing Power and Area by Interconnecting Memory Controllers to Memory Ranks with RF Coplanar Waveguides on the Same Package WEED 2011, ISCA Mario D. Marino, Kevin Skadron Dept. of Computer Science – UVA

2 What is the problem? Excessive power usage by the physical memory channel – 2mW/Gbits/s by Palmer et al. ISSCC07 – 160W for 10TB/s (Vantrease et al., ISCA08) – Poor scaling in physical channel: RC load in package

3 Outline Hypothesis: Wired-RF (ie, coplanar waveguides--CPWs) solves all these problems in technology that is easier to adopt than optical. Architecture for CPW memory interface Evaluation: area, power, and performance Conclusion PS: note that this is over wires (CPWs), not wireless!

4 Hypothesis: why wired-RF (RF) as a bandwidth solution? RF Low latency media and modulation (Chang et al., Near Speed-of-Light Signaling Over On-Chip Electrical, 2003) All electrical (impedances matching), development costs closer to CMOS distances from 1mm to 30cm (delays, energy, data rate; RF for Future Chips, Tam et al. 2011) Beckmann et al., Transmission Line Caches, MICRO03 Frank Chang et al. (caches, modulation, high bandwidth, latency ad power reduction; MICRO08, HPCA08) Quilt-packaging (RF coplanar waveguide connecting two dies, > 200GHz, low insertion loss, built), Liu, Buckhanan et al., Notre Dame Intel-Tera (Polka, ITJ07): on-package Modulation and high speed from optical

5 Why can't we use RF in a traditional fashion? Different impedances: I/O pad, inner and outer wire bonds, PCB pads, PCB [Liu, 2006]

6 Contributions Evaluate power and area gains by replacing power-hungry MC circuitry with on-die RF transceivers + CPW + Quilt packaging Evaluate architectural performance gains due to power and area gains

7 Diagram of the proposed organization Example with 1 core and 1 RFMC RF path from a specific core to its rank > 1mm

8 Detailed Organization RFMC: MCs coupled to on-die RF transceivers and on- and inter-die coplanar waveguides (CPW)

9 Quilt The use of Quilt (inter-dies distance ~40um) allows: – Extending on-die CPWs – Built for RF/low insertion loss: 0.1 dB – Use of processor-die and DRAM dies, RF transceivers, and UCLA RF models – Versus traditional power hungry transceivers (Palmer et al., ISSCC 2007) – Co-planar, not flip-chip – See Lius PhD dissertation and Buckhanan et al., UGIM10

10 Interfacing on-dies CPW and Quilt

11 Quilt Packaging is a CPW Extension of the interconnection of two dies facing each other Designed for frequencies larger than 200GHz Prototype from Notre Dame tested up to 60 GHz Insertion loss (*): 0.1 dB So far, no transceivers needed for Quilt; due to its low insertion loss

12 Transceivers: Power and Area Extracted from Chang, Tam with 10% power reduction on the amplifier to account for savings for Quilt-type packaging Technology (nm) Data rate Per band (Gbits/s) #carriers to match DRAM Power (TX+RX) (mW) Energy per bit (pJ) Area (TX + RX) mm

13 Area Comparison MC Area decreases for all components, but RF essentially eliminates PHY 2.4X area savings MCRFMC

14 Energy Comparison-PHY Even with technology improvements, RF is more efficient for distances >= 1mm and < 10mm Net power savings (incl. FE & TE) of 4.6X at 5mm

Performance Evaluation M5 and DRAMsim 32K L1s, 1MB/core L2 8 cores 1 DRAM rank per MC, DDR2, at 2 GHz Same FE, TE for both MC, RFMC No RF latency benefits in the performance evaluation

16 Performance: Stream Baselinecurrent CPUs: 3 or 4 MC RFMC is up to 2.4x faster than MC

17 Conclusions RF architecture for on-package CPU-DRAM interconnection Evolutionary changes to CPU and DRAM designstraightforward manufacturability Area and power benefits (preliminary; improve with Quilt dedicated circuits) Benefits on performance for more cores (limited to the number of ranks if the same proportion core-to-rank is desired)

18 Thanks!

19 Power Comparison FE and TE present power reduction PHY/RF part is evaluated in the next slide (McPAT does not model RF)