Smart Refresh: An Enhanced Memory Controller Design for Reducing Energy in Conventional and 3D Die-Stacked DRAMs Mrinmoy Ghosh Hsien-Hsin S. Lee School.

Slides:



Advertisements
Similar presentations
Main MemoryCS510 Computer ArchitecturesLecture Lecture 15 Main Memory.
Advertisements

Outline Memory characteristics SRAM Content-addressable memory details DRAM © Derek Chiou & Mattan Erez 1.
A Performance Comparison of DRAM Memory System Optimizations for SMT Processors Zhichun ZhuZhao Zhang ECE Department Univ. Illinois at ChicagoIowa State.
Jaewoong Sim Alaa R. Alameldeen Zeshan Chishti Chris Wilkerson Hyesoon Kim MICRO-47 | December 2014.
A Case for Refresh Pausing in DRAM Memory Systems
Exploiting Access Semantics and Program Behavior to Reduce Snoop Power in Chip Multiprocessors Chinnakrishnan S. Ballapuram Ahmad Sharif Hsien-Hsin S.
Managing Wire Delay in Large CMP Caches Bradford M. Beckmann David A. Wood Multifacet Project University of Wisconsin-Madison MICRO /8/04.
Virtual Exclusion: An Architectural Approach to Reducing Leakage Energy in Multiprocessor Systems Mrinmoy Ghosh Hsien-Hsin S. Lee School of Electrical.
4/17/20151 Improving Memory Bank-Level Parallelism in the Presence of Prefetching Chang Joo Lee Veynu Narasiman Onur Mutlu* Yale N. Patt Electrical and.
5-1 Memory System. Logical Memory Map. Each location size is one byte (Byte Addressable) Logical Memory Map. Each location size is one byte (Byte Addressable)
System Design Tricks for Low-Power Video Processing Jonah Probell, Director of Multimedia Solutions, ARC International.
Cache Memory Locality of reference: It is observed that when a program refers to memory, the access to memory for data as well as code are confined to.
Main Mem.. CSE 471 Autumn 011 Main Memory The last level in the cache – main memory hierarchy is the main memory made of DRAM chips DRAM parameters (memory.
CS.305 Computer Architecture Memory: Structures Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from slides kindly made.
Memories and the Memory Subsystem; The Memory Hierarchy; Caching; ROM.
Micro-Pages: Increasing DRAM Efficiency with Locality-Aware Data Placement Kshitij Sudan, Niladrish Chatterjee, David Nellans, Manu Awasthi, Rajeev Balasubramonian,
Lecture 12: DRAM Basics Today: DRAM terminology and basics, energy innovations.
1 Lecture 14: Cache Innovations and DRAM Today: cache access basics and innovations, DRAM (Sections )
DRAM. Any read or write cycle starts with the falling edge of the RAS signal. –As a result the address applied in the address lines will be latched.
1 Lecture 15: DRAM Design Today: DRAM basics, DRAM innovations (Section 5.3)
1 COMP 206: Computer Architecture and Implementation Montek Singh Wed., Nov. 19, 2003 Topic: Main Memory (DRAM) Organization.
Memory Hierarchy.1 Review: Major Components of a Computer Processor Control Datapath Memory Devices Input Output.
1 COMP 206: Computer Architecture and Implementation Montek Singh Wed., Nov. 13, 2002 Topic: Main Memory (DRAM) Organization.
Handling the Problems and Opportunities Posed by Multiple On-Chip Memory Controllers Manu Awasthi, David Nellans, Kshitij Sudan, Rajeev Balasubramonian,
Restrictive Compression Techniques to Increase Level 1 Cache Capacity Prateek Pujara Aneesh Aggarwal Dept of Electrical and Computer Engineering Binghamton.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon., Nov. 18, 2002 Topic: Main Memory (DRAM) Organization – contd.
Embedded DRAM for a Reconfigurable Array S.Perissakis, Y.Joo 1, J.Ahn 1, A.DeHon, J.Wawrzynek University of California, Berkeley 1 LG Semicon Co., Ltd.
1 Coordinated Control of Multiple Prefetchers in Multi-Core Systems Eiman Ebrahimi * Onur Mutlu ‡ Chang Joo Lee * Yale N. Patt * * HPS Research Group The.
Memory Technology “Non-so-random” Access Technology:
Mrinmoy Ghosh Weidong Shi Hsien-Hsin (Sean) Lee
Physical Memory By Gregory Marshall. MEMORY HIERARCHY.
CPE232 Memory Hierarchy1 CPE 232 Computer Organization Spring 2006 Memory Hierarchy Dr. Gheith Abandah [Adapted from the slides of Professor Mary Irwin.
CSIE30300 Computer Architecture Unit 07: Main Memory Hsin-Chou Chi [Adapted from material by and
1 Lecture: Virtual Memory, DRAM Main Memory Topics: virtual memory, TLB/cache access, DRAM intro (Sections 2.2)
Dong Hyuk Woo Nak Hee Seong Hsien-Hsin S. Lee
Memory Systems Embedded Systems Design and Implementation Witawas Srisa-an.
ECE 4100/6100 Advanced Computer Architecture Lecture 11 DRAM and Storage Prof. Hsien-Hsin Sean Lee School of Electrical and Computer Engineering Georgia.
Systems Overview Computer is composed of three main components: CPU Main memory IO devices Refers to page
Lecture 19 Today’s topics Types of memory Memory hierarchy.
EEE-445 Review: Major Components of a Computer Processor Control Datapath Memory Devices Input Output Cache Main Memory Secondary Memory (Disk)
Main Memory CS448.
University of Tehran 1 Interface Design DRAM Modules Omid Fatemi
Asynchronous vs. Synchronous Counters Ripple Counters Deceptively attractive alternative to synchronous design style State transitions are not sharp! Can.
1 Lecture 14: DRAM Main Memory Systems Today: cache/TLB wrap-up, DRAM basics (Section 2.3)
Modern DRAM Memory Architectures Sam Miller Tam Chantem Jon Lucas CprE 585 Fall 2003.
Computer Architecture Lecture 24 Fasih ur Rehman.
Chapter 4 Memory Design: SOC and Board-Based Systems
1 Efficient System-on-Chip Energy Management with a Segmented Counting Bloom Filter Mrinmoy Ghosh- Georgia Tech Emre Özer- ARM Ltd Stuart Biles- ARM Ltd.
1 CMP-MSI.07 CARES/SNU A Reusability-Aware Cache Memory Sharing Technique for High Performance CMPs with Private Caches Sungjune Youn, Hyunhee Kim and.
COMP541 Memories II: DRAMs
1 Adapted from UC Berkeley CS252 S01 Lecture 18: Reducing Cache Hit Time and Main Memory Design Virtucal Cache, pipelined cache, cache summary, main memory.
Simultaneous Multi-Layer Access Improving 3D-Stacked Memory Bandwidth at Low Cost Donghyuk Lee, Saugata Ghose, Gennady Pekhimenko, Samira Khan, Onur Mutlu.
Contemporary DRAM memories and optimization of their usage Nebojša Milenković and Vladimir Stanković, Faculty of Electronic Engineering, Niš.
1 Lecture: DRAM Main Memory Topics: DRAM intro and basics (Section 2.3)
CS35101 Computer Architecture Spring 2006 Lecture 18: Memory Hierarchy Paul Durand ( ) [Adapted from M Irwin (
5-1 ECE 424 Design of Microprocessor-Based Systems Haibo Wang ECE Department Southern Illinois University Carbondale, IL
CMSC 611: Advanced Computer Architecture Memory & Virtual Memory Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material.
1 Lecture 16: Main Memory Innovations Today: DRAM basics, innovations, trends HW5 due on Thursday; simulations can take a few hours Midterm: 32 scores.
1 Lecture: Memory Basics and Innovations Topics: memory organization basics, schedulers, refresh,
Massed Refresh: An Energy-Efficient Technique to Reduce Refresh Overhead in Hybrid Memory Cube Architectures. A DRAM Refresh Method By Ishan Thakkar, Sudeep Pasricha
Zhichun Zhu Zhao Zhang ECE Department ECE Department
Lecture 15: DRAM Main Memory Systems
Lecture: DRAM Main Memory
Lecture 23: Cache, Memory, Virtual Memory
Lecture: DRAM Main Memory
If a DRAM has 512 rows and its refresh time is 9ms, what should be the frequency of row refresh operation on the average?
Die Stacking (3D) Microarchitecture -- from Intel Corporation
Lecture 22: Cache Hierarchies, Memory
DRAM Hwansoo Han.
Presentation transcript:

Smart Refresh: An Enhanced Memory Controller Design for Reducing Energy in Conventional and 3D Die-Stacked DRAMs Mrinmoy Ghosh Hsien-Hsin S. Lee School of Electrical and Computer Engineering Georgia Tech

Ghosh & Lee, Smart Refresh 2/21 Motivation Increase in DRAM power consumption Increasing DRAM density Ability to put more DIMMs in a computing system Refresh is a major component of DRAM energy –up to 1/3 of DRAM energy 1 DRAM energy is a major component of system energy (consumes up to 10W) 1 M.Viredaz and D. Wallach, “Power Evaluation of a Handheld computer: A Case Study”, Technical report, Compaq WRL, 2001.

Ghosh & Lee, Smart Refresh 3/21 Outline Redundancy in conventional DRAM refresh techniques Smart Refresh architecture Our technique for 3D die-stacked DRAMs on processors Results

Ghosh & Lee, Smart Refresh 4/21 Current Refresh Policies Row Address Strobe (RAS) Only Refresh CAS Before RAS Refresh Memory Controller DRAM Module Memory Controller RRARRRAR RRARRRAR Addr Bus WE CAS RAS Addr Bus WE CAS RAS Assert RAS Row Address Refresh Row Assert RAS Refresh Row Assert CAS WE High Increment RRAR

Ghosh & Lee, Smart Refresh 5/21 Redundancy in Existing DRAM Refresh Techniques Each row accessed as soon as it is to be refreshed Refresh of DRAM is not required if the row is accessed Time Refresh Time for Row 0 Refresh Time for Row 1 Refresh Time for Row 2 Refresh Time for Row 3 Mem access Mem Refresh

Ghosh & Lee, Smart Refresh 6/21 Smart Refresh A countdown counter for each DRAM row The counter decrements to zero just before the row needs refreshing Update Counter Circuit Countdown Counters Pending Refresh Request Queue Memory Controller DRAM Module

Ghosh & Lee, Smart Refresh 7/21 Smart Refresh Implemented using RAS-only refresh Provides better energy savings than CBR refresh Update Counter Circuit Countdown Counters Pending Refresh Request Queue Memory Controller DRAM Module

Ghosh & Lee, Smart Refresh 8/21 Naïve (Simultaneous) Counter Updates 33…322…2 Simultaneous update causes burst refresh Solution? If the counters are initialized to different initial values 11…1 Counters initialized to max after access/ refresh Refresh if counter = 0 00…033…3

Ghosh & Lee, Smart Refresh 9/21 Naïve (Simultaneous) Counter Updates 30…2 One fourth of the counters simultaneously become zero => Burst refresh situation Solution? Staggering of counter updates 12…023…101…301…3

Ghosh & Lee, Smart Refresh 10/21 Staggered Counter Updates At most K simultaneous refreshes, K = number of logical segments. Correctness condition: Interval between two counter updates must be enough to handle K refresh operations. Segment 1 Segment 2 Segment ….. 16 T 02…002…0 02…0 T+1 ms 32…032…0 32…0 T+2 ms 31…031…0 31…0 T+16 ms 31…331…3 31…3 This Example: Refresh Interval = 64 ms, All counters updated once within 16ms Iterates over all the indeces four times within 64 ms

Ghosh & Lee, Smart Refresh 11/21 3D Die Stacking Why stack DRAM on top of processors –High density inter-die vias –Short distance inter-die vias –Lower power –High throughput Heat sink Processor DRAM (Thinned die) Die-to-die vias

Ghosh & Lee, Smart Refresh 12/21 Smart Refresh for 3D DRAM Cache DRAM Cache Issues –More accesses per cycle –Higher temperature (90 C)  higher refresh rates. –Significant potential for Smart Refresh Tags Core 0 Core 1 L2 Cache 64 MB DRAM Cache Off Chip DRAM Memory

Ghosh & Lee, Smart Refresh 13/21 Other Applications of Smart Refresh Use programmable counters to keep rows off Implement Retention-aware DRAMs [HPCA-06] Change protocol to reduce address transmission overhead

Ghosh & Lee, Smart Refresh 14/21 Simulation: Experimental Framework Instruction stream Simics (Full system functional simulator) Ruby (Cache hierarchy simulator) Memory references DRAMsim (DRAM simulator) Power model: DRAM: DRAMsim Counters: Artisan SRAM generator Workload: Biobench Splash-2 SpecInt 2000

Ghosh & Lee, Smart Refresh 15/21 DRAM Configurations ParameterConventional DRAM 3D die-stacked DRAM cache TypeDDR2 Size2 GB and 4 GB64 MB Rows16384 Frequency667 MHz Number of banks4 and 84 Number of ranks21 Number of columns Data width64 Row buffer policyOpen page Refresh interval64 milliseconds32 milliseconds L2 cache size1 MB

Ghosh & Lee, Smart Refresh 16/21 # of Refreshes Per Second (4 GB DRAM) Average reduction in number of refreshes per second = 40 % Baseline = 4,096,000

Ghosh & Lee, Smart Refresh 17/21 Refresh Energy Savings (4GB DRAM) Average energy saving = 23.8%

Ghosh & Lee, Smart Refresh 18/21 Total DRAM Energy Savings (4 GB DRAM) Average energy saving = 9.1% (up to 21% in perl_twolf) No performance degradation

Ghosh & Lee, Smart Refresh 19/21 Total Energy Saving (64 MB 3D DRAM Cache) Average energy saving = 6.9% (up to 12% in Tiger)

Ghosh & Lee, Smart Refresh 20/21 Conclusions Redundant refresh operations cost significant energy Smart refresh eliminates unnecessary periodic refreshes 11% (up to 17%) energy savings in conventional DRAMs 7% energy savings in 3D DRAM caches No performance impact

Thank You! Georgia Tech ECE MARS Labs

Ghosh & Lee, Smart Refresh 22/21 Correctness of Smart Refresh

Ghosh & Lee, Smart Refresh 23/21 No overflow of refresh queue Typical Refresh Time = 70 ns Counter Update Period = 8ms/((16384)/8) = 3906 ns Number of refreshes possible = 56 Number of refreshes required = 8

Ghosh & Lee, Smart Refresh 24/21 Area Overhead Number of counters = 16384*2*4 = Space for 3 bit counters = *3/(8*1024) = 48kB Ways to mitigate Area Overhead; Use 2 bit counters. Have DRAM module block for counters