EXTRAPOLATION PITFALLS WHEN EVALUATING LIMITED ENDURANCE MEMORY Rishiraj Bheda, Jesse Beu, Brian Railing, Tom Conte Tinker Research.

Slides:



Advertisements
Similar presentations
Virtual Memory (Chapter 4.3)
Advertisements

Background Virtual memory – separation of user logical memory from physical memory. Only part of the program needs to be in memory for execution. Logical.
Pooja ROY, Manmohan MANOHARAN, Weng Fai WONG National University of Singapore ESWEEK (CASES) October 2014 EnVM : Virtual Memory Design for New Memory Architectures.
Memory Management: Overlays and Virtual Memory
D. Tam, R. Azimi, L. Soares, M. Stumm, University of Toronto Appeared in ASPLOS XIV (2009) Reading Group by Theo 1.
Practical Caches COMP25212 cache 3. Learning Objectives To understand: –Additional Control Bits in Cache Lines –Cache Line Size Tradeoffs –Separate I&D.
1 Lecture 6: Chipkill, PCM Topics: error correction, PCM basics, PCM writes and errors.
Page 15/4/2015 CSE 30341: Operating Systems Principles Allocation of Frames  How should the OS distribute the frames among the various processes?  Each.
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 10: Virtual Memory Background Demand Paging Process Creation Page Replacement.
Module 9: Virtual Memory
Module 10: Virtual Memory Background Demand Paging Performance of Demand Paging Page Replacement Page-Replacement Algorithms Allocation of Frames Thrashing.
Virtual Memory Introduction to Operating Systems: Module 9.
Segmentation and Paging Considerations
Data Mapping for Higher Performance and Energy Efficiency in Multi-Level Phase Change Memory HanBin Yoon*, Naveen Muralimanohar ǂ, Justin Meza*, Onur Mutlu*,
Reducing Read Latency of Phase Change Memory via Early Read and Turbo Read Feb 9 th 2015 HPCA-21 San Francisco, USA Prashant Nair - Georgia Tech Chiachen.
Phase Change Memory What to wear out today? Chris Craik, Aapo Kyrola, Yoshihisa Abe.
CSCE 212 Chapter 7 Memory Hierarchy Instructor: Jason D. Bakos.
File Management Systems
Micro-Pages: Increasing DRAM Efficiency with Locality-Aware Data Placement Kshitij Sudan, Niladrish Chatterjee, David Nellans, Manu Awasthi, Rajeev Balasubramonian,
Memory Design Example. Selecting Memory Chip Selecting SRAM Memory Chip.
Operating System Support Focus on Architecture
1 COMP 206: Computer Architecture and Implementation Montek Singh Wed., Nov. 13, 2002 Topic: Main Memory (DRAM) Organization.
Memory Management 2010.
Chapter 8 Operating System Support (Continued)
Computer Organization and Architecture
1 Lecture 14: DRAM, PCM Today: DRAM scheduling, reliability, PCM Class projects.
Data Storage Technology
A Characterization of Processor Performance in the VAX-11/780 From the ISCA Proceedings 1984 Emer & Clark.
Layers and Views of a Computer System Operating System Services Program creation Program execution Access to I/O devices Controlled access to files System.
Rensselaer Polytechnic Institute CSC 432 – Operating Systems David Goldschmidt, Ph.D.
Managing Multi-Configuration Hardware via Dynamic Working Set Analysis By Ashutosh S.Dhodapkar and James E.Smith Presented by Kyriakos Yioutanis.
Defining Anomalous Behavior for Phase Change Memory
Page 19/17/2015 CSE 30341: Operating Systems Principles Optimal Algorithm  Replace page that will not be used for longest period of time  Used for measuring.
Chapter 5 Operating System Support. Outline Operating system - Objective and function - types of OS Scheduling - Long term scheduling - Medium term scheduling.
Lecture 19: Virtual Memory
1 Towards Phase Change Memory as a Secure Main Memory André Seznec IRISA/INRIA.
The Memory Hierarchy 21/05/2009Lecture 32_CA&O_Engr Umbreen Sabir.
CSIE30300 Computer Architecture Unit 08: Cache Hsin-Chou Chi [Adapted from material by and
Virtual Memory Virtual Memory is created to solve difficult memory management problems Data fragmentation in physical memory: Reuses blocks of memory.
Replicating Memory Behavior for Performance Skeletons Aditya Toomula PC-Doctor Inc. Reno, NV Jaspal Subhlok University of Houston Houston, TX By.
© 2007 IBM Corporation MICRO-2009 Start-Gap: Low-Overhead Near-Perfect Wear Leveling for Main Memories Moinuddin Qureshi John Karidis, Michele Franceschini.
Introduction: Memory Management 2 Ideally programmers want memory that is large fast non volatile Memory hierarchy small amount of fast, expensive memory.
Multilevel Caches Microprocessors are getting faster and including a small high speed cache on the same chip.
1 Lecture 8: Virtual Memory Operating System Fall 2006.
COMP SYSTEM ARCHITECTURE PRACTICAL CACHES Sergio Davies Feb/Mar 2014COMP25212 – Lecture 3.
1 Appendix C. Review of Memory Hierarchy Introduction Cache ABCs Cache Performance Write policy Virtual Memory and TLB.
High Performance Computing1 High Performance Computing (CS 680) Lecture 2a: Overview of High Performance Processors * Jeremy R. Johnson *This lecture was.
Chapter 11 System Performance Enhancement. Basic Operation of a Computer l Program is loaded into memory l Instruction is fetched from memory l Operands.
The Memory Hierarchy Lecture 31 20/07/2009Lecture 31_CA&O_Engr. Umbreen Sabir.
Memory Hierarchy Ideal memory is fast, large, and inexpensive
Memory COMPUTER ARCHITECTURE
CS703 - Advanced Operating Systems
Chapter 9 – Real Memory Organization and Management
Scalable High Performance Main Memory System Using PCM Technology
Module 9: Virtual Memory
Lecture 23: Cache, Memory, Virtual Memory
Reference-Driven Performance Anomaly Identification
Computer Architecture
5: Virtual Memory Background Demand Paging
Lecture 6: Reliability, PCM
Lecture 14: Large Cache Design II
CSE 451: Operating Systems Autumn 2005 Memory Management
2.C Memory GCSE Computing Langley Park School for Boys.
Use ECP, not ECC, for hard failures in resistive memories
CSE 451: Operating Systems Autumn 2003 Lecture 9 Memory Management
CSE 451: Operating Systems Autumn 2003 Lecture 9 Memory Management
Lecture 9: Caching and Demand-Paged Virtual Memory
Module 9: Virtual Memory
Virtual Memory.
Border Control: Sandboxing Accelerators
Presentation transcript:

EXTRAPOLATION PITFALLS WHEN EVALUATING LIMITED ENDURANCE MEMORY Rishiraj Bheda, Jesse Beu, Brian Railing, Tom Conte Tinker Research

Need for New Memory Technology  DRAM density scalability problems  Capacitive cells formed via ‘wells’ in silicon  More difficult as feature size decreases.  DRAM energy scalability problems  Capacitive cells leak charge over time  Require periodic refreshing of cells to maintain value

High Density Memories  Magento-resistive RAM – MRAM  Free magnetic layer’s polarity stops flipping  ~10 15 writes  Ferro-electric RAM – FeRam  Ferrous material degradation  ~10 9 writes  Phase Change Memory – PCM  Metal fatigue from heating/cooling  ~10 8 writes

Background - Addressing Wear Out  For viable DRAM replacement, mean time to failure (MTTF) must be increased  Common solutions include  Write filtering  Wear leveling  Write prevention

Write Filtering  General rule of thumb, combine multiple writes  Caching mechanisms filter access stream, capturing multiple writes to the same location, merge into single event  Write buffers  On-chip caches  DRAM pre-access caches (Qureshi et al.)  Not to be confused with write prevention (bit-wise)

Write Filtering Example Processor Write Stream $ $ L2 Cache Filtered Stream Mem Con DRAM Cache

Write Prevention  General rule of thumb, bitwise comparison techniques to reduce write  Ex: Flip-and-write  Pick shorter hamming distance between natural and inverted versions of data, then write.

Write Prevention Example X X Σ Σ

Write Leveling  General rule of thumb – Spread out accesses to remove wear-out ‘hotspots’  Powerful technique when correctly applied  Uniform wearing of the device  The larger the device, the longer the MTTF  Multi-grain Opportunity  Word-level - Low-order bits have higher variation  Page-level - Low numbers blocks written to more often  Application-level – few high activity ‘hot’ pages

Overview  Background  Extrapolation pitfalls  Impact of OS  Memory Sizing and Page Faults  Estimates over multiple runs  Line Write Profile  Core take away of this work

Extrapolation Pitfalls  Single run extrapolation, OS and long-term scope  Natural wear leveling from paging system  Interaction of multiple running processes  Process creation and termination  A single, isolated run is not representative!  Main memory sizing and impact of high density  Benchmark ‘region of interest’  Several solutions exist (sampling, simpoints, etc.)

OS Paging  Goal  Have enough free pages to meet new demand  Balanced against utilization of capacity  Solution  Actively used pages keep valid translations  Inactive pages migrate to free list; reclaimed for future use Reclamation shuffles translations over time!

Impact of shuffling

Main Memory Sizing  Artificially high page fault frequency when simulating with too little  Collision behavior can be wildly different  Impact on write prevention results

MTTF improvement with size  Unreasonable to assume device failure with first cell failure  Device degradation vs. failure  Larger device takes longer to degrade  Even better in the presence of wear leveling  More memory means more physical locations to apply wear leveling across  Assuming write frequency is fixed*, increase in size means proportional increase in MTTF

Benchmark Characteristics

How much does this all matter?  Short version – a lot  Two Consecutive runs increase max write estimate by only 12%, not 100%

Higher Execution Count  Non-linear behavior over many more executions  Sawtooth-like pattern due to write-spike collisions  Lifetime estimates in years instead of months!

How should we estimate lifetime?  Running even a single execution of a benchmark can become prohibitively expensive  Apply sampling to extract benchmark write behavior  Heuristic should be able to approximate lifetime after many many execution iterations  Line Write Profile holds the key

Line Write Profile  Can be viewed as a superposition of all page write profiles  Line Write Profile provides a summary of write behavior Page ID Line ID Line Offset Line ID Physical Address

Line Write Profile  For every write access to physical memory  Extract LineID  For a Last Level Cache with Line Size of 64 Bytes A 4KB OS Page contains 64 cache lines Use a counter for each of these 64 lines Increment counter by 1 for every write that reaches main memory

Line Write Profile – cg (Full Run)

Line Write Profile – cg (100 Billion Instructions)

Using Line Write Profile  As the number of runs approaches infinity  If every physical memory page has equal chances of being accessed, then Every physical page tends towards the same write profile At this point, the lifetime curve reaches a settling point  The maximum value from the Line Write Profile can then be used to accurately estimate lifetime in the presence of an OS.

So is wear endurance is a myth?  Short answer – no  Applications that pin physical pages will not exhibit natural OS wear leveling  Security threats are still an issue  And the OS can easily be bypassed to void warranty  Hardware wear leveling solutions can be low cost and effective

Final Take Away  Wear endurance research should not report results that do not take multi-execution, inter-process and intra-process OS paging effects into account.  Techniques that depend on data (write prevention) should carefully consider appropriate memory sizing and page fault impact  Ignoring these can result in grossly underestimating baseline lifetimes and/or grossly overestimating lifetime improvement.

Thank You Questions?