†The Pennsylvania State University

Slides:



Advertisements
Similar presentations
Main MemoryCS510 Computer ArchitecturesLecture Lecture 15 Main Memory.
Advertisements

Chapter 5 Internal Memory
A Case for Refresh Pausing in DRAM Memory Systems
Orchestrated Scheduling and Prefetching for GPGPUs Adwait Jog, Onur Kayiran, Asit Mishra, Mahmut Kandemir, Onur Mutlu, Ravi Iyer, Chita Das.
1 Lecture 13: Cache and Virtual Memroy Review Cache optimization approaches, cache miss classification, Adapted from UCB CS252 S01.
Smart Refresh: An Enhanced Memory Controller Design for Reducing Energy in Conventional and 3D Die-Stacked DRAMs Mrinmoy Ghosh Hsien-Hsin S. Lee School.
Agenda Types of Storage Media Semiconductor ROM and RAM Magnetic Tapes, Floppy Disks, and Hard Disks Optical CS C446 Data Storage Technologies & Networks.
System Design Tricks for Low-Power Video Processing Jonah Probell, Director of Multimedia Solutions, ARC International.
Cache Memory Locality of reference: It is observed that when a program refers to memory, the access to memory for data as well as code are confined to.
STT-RAM as a sub for SRAM and DRAM
Phase Change Memory What to wear out today? Chris Craik, Aapo Kyrola, Yoshihisa Abe.
Kevin Walsh CS 3410, Spring 2010 Computer Science Cornell University Memory See: P&H Appendix C.8, C.9.
Lecture 12: DRAM Basics Today: DRAM terminology and basics, energy innovations.
1 Lecture 15: DRAM Design Today: DRAM basics, DRAM innovations (Section 5.3)
Memory Hierarchy.1 Review: Major Components of a Computer Processor Control Datapath Memory Devices Input Output.
1 COMP 206: Computer Architecture and Implementation Montek Singh Wed., Nov. 13, 2002 Topic: Main Memory (DRAM) Organization.
Data Storage Technology
Chapter 5 Data Storage Technology 2005 IS112. Chapter goals Describe the distinguishing characteristics of primary and secondary storage Describe the.
Computing Systems Memory Hierarchy.
Computer Architecture Part III-A: Memory. A Quote on Memory “With 1 MB RAM, we had a memory capacity which will NEVER be fully utilized” - Bill Gates.
Memory and Secondary Memory Devices RDRAM vs DDR SDRAM MEMS disk alternatives Flash Memory IBM Microdisk Brian Hanczaryk Albert Meixner.
NVSleep: Using Non-Volatile Memory to Enable Fast Sleep/Wakeup of Idle Cores Xiang Pan and Radu Teodorescu Computer Architecture Research Lab
Cooperative Caching for Chip Multiprocessors Jichuan Chang Guri Sohi University of Wisconsin-Madison ISCA-33, June 2006.
Reducing Refresh Power in Mobile Devices with Morphable ECC
1 Reducing DRAM Latencies with an Integrated Memory Hierarchy Design Authors Wei-fen Lin and Steven K. Reinhardt, University of Michigan Doug Burger, University.
Chapter 5 Internal Memory. Semiconductor Memory Types.
1 Tuning Garbage Collection in an Embedded Java Environment G. Chen, R. Shetty, M. Kandemir, N. Vijaykrishnan, M. J. Irwin Microsystems Design Lab The.
CIM101 : Introduction to computer Lecture 3 Memory.
CSIE30300 Computer Architecture Unit 08: Cache Hsin-Chou Chi [Adapted from material by and
+ CS 325: CS Hardware and Software Organization and Architecture Memory Organization.
Energy Reduction for STT-RAM Using Early Write Termination Ping Zhou, Bo Zhao, Jun Yang, *Youtao Zhang Electrical and Computer Engineering Department *Department.
Evaluating STT-RAM as an Energy-Efficient Main Memory Alternative
CMP L2 Cache Management Presented by: Yang Liu CPS221 Spring 2008 Based on: Optimizing Replication, Communication, and Capacity Allocation in CMPs, Z.
CS/EE 5810 CS/EE 6810 F00: 1 Main Memory. CS/EE 5810 CS/EE 6810 F00: 2 Main Memory Bottom Rung of the Memory Hierarchy 3 important issues –capacity »BellÕs.
CMP/CMT Scaling of SPECjbb2005 on UltraSPARC T1 (Niagara) Dimitris Kaseridis and Lizy K. John The University of Texas at Austin Laboratory for Computer.
BarrierWatch: Characterizing Multithreaded Workloads across and within Program-Defined Epochs Socrates Demetriades and Sangyeun Cho Computer Frontiers.
Emerging Non-volatile Memories: Opportunities and Challenges
A memory is just like a human brain. It is used to store data and instructions. Computer memory is the storage space in computer where data is to be processed.
1 Adapted from UC Berkeley CS252 S01 Lecture 18: Reducing Cache Hit Time and Main Memory Design Virtucal Cache, pipelined cache, cache summary, main memory.
Section 13.1 – Secondary storage management (Former Student’s Note)
CS203 – Advanced Computer Architecture
CMSC 611: Advanced Computer Architecture Memory & Virtual Memory Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material.
Mellow Writes: Extending Lifetime in Resistive Memories through Selective Slow Write Backs Lunkai Zhang, Diana Franklin, Frederic T. Chong 1 Brian Neely,
TYPES OF MEMORY.
Hang Zhang1, Xuhao Chen1, Nong Xiao1,2, Fang Liu1
Improving Multi-Core Performance Using Mixed-Cell Cache Architecture
Failure-Atomic Slotted Paging for Persistent Memory
UH-MEM: Utility-Based Hybrid Memory Management
Memory.
Managing GPU Concurrency in Heterogeneous Architectures
CS-301 Introduction to Computing Lecture 17
Moinuddin K. Qureshi ECE, Georgia Tech Gabriel H. Loh, AMD
Scalable High Performance Main Memory System Using PCM Technology
CSCI206 - Computer Organization & Programming
Competitive Advantage of Registered Memory
Patent Portfolio on Chip Design for Smart Memories
Computer Memory BY- Dinesh Lohiya.
Discovering Computers 2014: Chapter6
Competitive Advantage of Registered Memory
Memory Organization.
Section 13.1 – Secondary storage management (Former Student’s Note)
MICROPROCESSOR MEMORY ORGANIZATION
Die Stacking (3D) Microarchitecture -- from Intel Corporation
2.C Memory GCSE Computing Langley Park School for Boys.
Lecture 22: Cache Hierarchies, Memory
Cache - Optimization.
Cache Memory and Performance
A Novel Cache-Utilization Based Dynamic Voltage Frequency Scaling (DVFS) Mechanism for Reliability Enhancements *Yen-Hao Chen, *Yi-Lun Tang, **Yi-Yu Liu,
Architecting Phase Change Memory as a Scalable DRAM Alternative
Overview Problem Solution CPU vs Memory performance imbalance
Presentation transcript:

†The Pennsylvania State University Enter Title of Presentation Here Cache Revive: Architecting Volatile STT-RAM Caches for Enhanced Performance in CMPs Adwait Jog†, Asit K. Mishra‡, Cong Xu†, Yuan Xie†, N. Vijaykrishnan†, Ravi Iyer‡, Chita R. Das† †The Pennsylvania State University ‡ Intel Corporation Google Confidential 1

STT-RAM as Emerging Memory Technology Spin-Torque Transfer RAM (STT-RAM) combines the speed of SRAM, density of DRAM, and non-volatility of Flash memory, making it attractive for on chip cache hierarchies. STT-RAM caches suffer from long write latency and higher write energy consumption when compared to traditional SRAM caches.

~3-4x denser (capacity benefit) ~11x higher write latency SRAM vs. STT-RAM Area (mm2) Read Energy (nJ) Write Energy (nJ) Leakage Power at (mW) Read Latency (ns) Write latency (ns) Read @ 2 GHz (cycles) Write @2 GHz (cycles) 1 MB SRAM 2.61 0.578 4542 1.012 2 4MB STT-RAM 3.00 1.035 1.066 2524 0.998 10.61 22 ~3-4x denser (capacity benefit) 1.8x lower leakage energy Comparable read latency ~11x higher write latency (@ 2GHZ)

Proposal : Reduce Retention Time Years of data-retention time for STT-RAM may not be required. Trade-off retention time for lower STT-RAM write latency Challenge: Architecting “Volatile STT-RAM” Caches Advantage: Performance and Energy Benefits! Proposal : Reduce Retention Time

How to Calculate Optimal Retention Time? (1) Device Constraints: Retention Time of STT-RAM can be reduced to a certain limit. (2) Application Needs: Application Characteristics show that data-retention time in range of milliseconds is sufficient enough to make STT-RAM caches effective for CMPs. How to Calculate Optimal Retention Time? Both Device Constraints and Application Needs should be considered for Optimal Results!

How to Reduce STT-RAM Write Latency? Retention Time Operating Point Write current goes down with reduction in retention time Retention Time of STT-RAM Write Latency @ 2 GHz 10 Years 22 cycles 1 second 12 cycles 10 millisecond 6 cycles

Majority (> 50%) of L2 Cache Blocks get refreshed within 10ms How much non-volatility can be traded off? Inter-Write Time (Refresh Time) Distributions of Multi-threaded and Multi-Programmed Benchmarks PARSEC SPEC 2006 Majority (> 50%) of L2 Cache Blocks get refreshed within 10ms

Volatile STT-RAM Based Last level Cache Design How to save rest 50% of the blocks? Answer: Use Selective Refresh Policy. Only refresh cache blocks which are in MRU Slots. Dying Blocks (Refresh) Dying Blocks (Do not Refresh) WAY ID 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Block State IMP Blocks NON- IMP Blocks

How to refresh? IMP Blocks NON- IMP Blocks WAY ID 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Block State COPY BACK YES Is Buffer Full? Dirty? YES COPY NO Write-back to DRAM

Results: Speedup Improvement On Average, 18 % Performance Improvement for PARSEC Multithreaded Benchmarks On Average, 10% Improvement in Instruction Throughput for Multi-programmed workloads PARSEC Benchmarks SPEC Benchmarks

Results: Energy Improvements Nominal Increase in Dynamic Energy (4%) over M-4MB because of Buffer Scheme 60 % reduction in Leakage Energy over SRAM designs

Summary STT-RAM is a promising technology, which has high density, low leakage and competitive read latencies compared to SRAM. High Write Latency and Energy is impeding its widespread adoption. Reducing Retention time can directly reduce the write-latency and write energy of STT-RAM. A Simple Buffering Scheme is presented to refresh important diminishing blocks.