Steven Pelley, Peter M. Chen, Thomas F. Wenisch University of Michigan

Slides:



Advertisements
Similar presentations
Bounded Model Checking of Concurrent Data Types on Relaxed Memory Models: A Case Study Sebastian Burckhardt Rajeev Alur Milo M. K. Martin Department of.
Advertisements

1 Episode III in our multiprocessing miniseries. Relaxed memory models. What I really wanted here was an elephant with sunglasses relaxing On a beach,
1 Lecture 20: Synchronization & Consistency Topics: synchronization, consistency models (Sections )
Exploring Memory Consistency for Massively Threaded Throughput- Oriented Processors Blake Hechtman Daniel J. Sorin 0.
CS492B Analysis of Concurrent Programs Lock Basics Jaehyuk Huh Computer Science, KAIST.
Better I/O Through Byte-Addressable, Persistent Memory
Scalable Multi-Cache Simulation Using GPUs Michael Moeng Sangyeun Cho Rami Melhem University of Pittsburgh.
CS 162 Memory Consistency Models. Memory operations are reordered to improve performance Hardware (e.g., store buffer, reorder buffer) Compiler (e.g.,
D u k e S y s t e m s Time, clocks, and consistency and the JMM Jeff Chase Duke University.
1 Lecture 20: Speculation Papers: Is SC+ILP=RC?, Purdue, ISCA’99 Coherence Decoupling: Making Use of Incoherence, Wisconsin, ASPLOS’04 Selective, Accurate,
Is SC + ILP = RC? Presented by Vamshi Kadaru Chris Gniady, Babak Falsafi, and T. N. VijayKumar - Purdue University Spring 2005: CS 7968 Parallel Computer.
Memory Consistency Arbob Ahmad, Henry DeYoung, Rakesh Iyer /18-740: Recent Research in Architecture October 14, 2009.
Thread-Level Transactional Memory Decoupling Interface and Implementation UW Computer Architecture Affiliates Conference Kevin Moore October 21, 2004.
Benchmarking Parallel Code. Benchmarking2 What are the performance characteristics of a parallel code? What should be measured?
1 Lecture 21: Transactional Memory Topics: consistency model recap, introduction to transactional memory.
By Sarita Adve & Kourosh Gharachorloo Review by Jim Larson Shared Memory Consistency Models: A Tutorial.
1 Lecture 7: Consistency Models Topics: sequential consistency, requirements to implement sequential consistency, relaxed consistency models.
Computer Architecture II 1 Computer architecture II Lecture 9.
1 Lecture 12: Relaxed Consistency Models Topics: sequential consistency recap, relaxing various SC constraints, performance comparison.
Shared Memory Consistency Models: A Tutorial By Sarita V Adve and Kourosh Gharachorloo Presenter: Sunita Marathe.
Memory Consistency Models Some material borrowed from Sarita Adve’s (UIUC) tutorial on memory consistency models.
Shared Memory Consistency Models: A Tutorial Sarita V. Adve Kouroush Ghrachorloo Western Research Laboratory September 1995.
Eraser: A Dynamic Data Race Detector for Multithreaded Programs STEFAN SAVAGE, MICHAEL BURROWS, GREG NELSON, PATRICK SOBALVARRO, and THOMAS ANDERSON Ethan.
CEPH: A SCALABLE, HIGH-PERFORMANCE DISTRIBUTED FILE SYSTEM S. A. Weil, S. A. Brandt, E. L. Miller D. D. E. Long, C. Maltzahn U. C. Santa Cruz OSDI 2006.
1 Evaluating the Impact of Thread Escape Analysis on Memory Consistency Optimizations Chi-Leung Wong, Zehra Sura, Xing Fang, Kyungwoo Lee, Samuel P. Midkiff,
Shared Memory Consistency Models. SMP systems support shared memory abstraction: all processors see the whole memory and can perform memory operations.
Memory Consistency Models. Outline Review of multi-threaded program execution on uniprocessor Need for memory consistency models Sequential consistency.
Transactional Coherence and Consistency Presenters: Muhammad Mohsin Butt. (g ) Coe-502 paper presentation 2.
CHARLES UNIVERSITY IN PRAGUE faculty of mathematics and physics Advanced.NET Programming I 5 th Lecture Pavel Ježek
ICFEM 2002, Shanghai Reasoning about Hardware and Software Memory Models Abhik Roychoudhury School of Computing National University of Singapore.
CS533 Concepts of Operating Systems Jonathan Walpole.
CISC 879 : Advanced Parallel Programming Rahul Deore Dept. of Computer & Information Sciences University of Delaware Exploring Memory Consistency for Massively-Threaded.
1 Lecture 20: Speculation Papers: Is SC+ILP=RC?, Purdue, ISCA’99 Coherence Decoupling: Making Use of Incoherence, Wisconsin, ASPLOS’04.
Specifying Multithreaded Java semantics for Program Verification Abhik Roychoudhury National University of Singapore (Joint work with Tulika Mitra)
CSE 486/586 CSE 486/586 Distributed Systems Consistency Steve Ko Computer Sciences and Engineering University at Buffalo.
CS717 1 Hardware Fault Tolerance Through Simultaneous Multithreading (part 2) Jonathan Winter.
Fundamentals of Memory Consistency Smruti R. Sarangi Prereq: Slides for Chapter 11 (Multiprocessor Systems), Computer Organisation and Architecture, Smruti.
Introduction to Performance Tuning Chia-heng Tu PAS Lab Summer Workshop 2009 June 30,
740: Computer Architecture Memory Consistency Prof. Onur Mutlu Carnegie Mellon University.
NVWAL: Exploiting NVRAM in Write-Ahead Logging
Free Transactions with Rio Vista Landon Cox April 15, 2016.
Hathi: Durable Transactions for Memory using Flash
Lecture 20: Consistency Models, TM
Delegated Persist Ordering
An Operational Approach to Relaxed Memory Models
Failure-Atomic Slotted Paging for Persistent Memory
Advanced .NET Programming I 11th Lecture
Free Transactions with Rio Vista
Speculative Lock Elision
Curator: Self-Managing Storage for Enterprise Clusters
High-performance transactions for persistent memories
Memory Consistency Models
Lecture 11: Consistency Models
Memory Consistency Models
Specifying Multithreaded Java semantics for Program Verification
Better I/O Through Byte-Addressable, Persistent Memory
Persistency for Synchronization-Free Regions
Threads and Memory Models Hal Perkins Autumn 2011
Free Transactions with Rio Vista
Threads and Memory Models Hal Perkins Autumn 2009
Lecture 22: Consistency Models, TM
Background for Debate on Memory Consistency Models
Store Atomicity What does atomicity really require?
Memory Consistency Models
Threads David Ferry CSCI 3500 – Operating Systems
Relaxed Consistency Part 2
Relaxed Consistency Finale
Lecture 21: Synchronization & Consistency
Lecture: Consistency Models, TM
Lecture 11: Relaxed Consistency Models
Presentation transcript:

Steven Pelley, Peter M. Chen, Thomas F. Wenisch University of Michigan Memory Persistency Steven Pelley, Peter M. Chen, Thomas F. Wenisch University of Michigan

Nonvolatile memory (NVRAM) recovery Writes unordered! Writes to memory unordered (cache eviction) But, recovery depends on write ordering Enforcing order for all writes too slow! Constrain persist order for correctness, but reorder for performance

Persist performance Persist ordering constraints form a directed acyclic graph (DAG) Critical path limits overall performance Remove unnecessary ordering constraints Requires an interface to describe constraints 1: Persist data[0] 2: Persist data[1] 3: Persist flag Program order implies unnecessary constraints 3 2 1

Expose persist concurrency; sounds like consistency! Persist performance Persist ordering constraints form a directed acyclic graph (DAG) Critical path limits overall performance Remove unnecessary ordering constraints Requires an interface to describe constraints 3 1 2 1: Persist data[0] 2: Persist data[1] 3: Persist flag Need interface to specify necessary constraints Expose persist concurrency; sounds like consistency!

Memory persistency: consistency models for NVRAM Framework to reason about persist order while maximizing concurrency Just as in consistency, may be strict or relaxed Strict: persist order matches store visibility order Relaxed: persist order need not match store order Our contribution: Define memory persistency; explore design space Relaxed persistency enables native instruction execution rate (30x speedup over strict persistency) while preserving data integrity across failure

Outline Define memory persistency Strict persistency and models Relaxed persistency and models Methodology and evaluation

Outline Define memory persistency Strict persistency and models Relaxed persistency and models Methodology and evaluation

Memory consistency models Enable performance via memory concurrency Provide ordering guarantees when needed Model separate from implementation May be strict or relaxed Consistency spectrum Persistency similarly decouples implementation from model, and allows both strict and relaxed models

Abstracting failure: recovery observer Memory consistency: Constrain order of loads and stores between processors Memory persistency: Imagine failure as recovery observer Atomically loads all memory at failure following consistency model Use recovery observer to reason about recovery semantics Persistency = Consistency + Recovery observer

Persistency design space Volatile memory order Persistent memory order Happens before: Strict persistency: single memory order Relaxed persistency: separate volatile and (new) persistent memory orders

Outline Define memory persistency Strict persistency and models Relaxed persistency and models Methodology and evaluation

Strict persistency Enforce persist order to match store order Thus, consistency model also orders persists Store and persist are the same event Persists to different addresses from different threads can still be concurrent Implementation free to optimize In-hardware speculation? Logging/indirection?

Strict persistency under Sequential Consistency (SC) Lock(volatile mutex) Persist data[0] Persist data[1] … Persist data[N] Persist flag Unlock(volatile mutex) No annotation required Persists serialize according to program order Volatile accesses synchronize persists from different threads Must rely on multi-threading for persist concurrency

Strict persistency under Relaxed Memory Order (RMO) Lock(volatile mutex) Barrier Persist data[0] Persist data[1] … Persist data[N] Persist flag Unlock(volatile mutex) Barriers constrain visible order of loads/stores These same barriers order persists Persists within a single thread may be concurrent

Outline Define memory persistency Strict persistency and models Relaxed persistency and models Methodology and evaluation

Relaxed persistency Decouple thread and persist synchronization Persist order may deviate from store order Separate volatile and persistent memory orders Persist barriers order persists Consistency and persistency time scales differ Expose additional concurrency only where necessary

Relaxed persistency models Epoch persistency [similar to BPFS cache] Persist barriers separate execution into epochs Persists within same epoch are concurrent Complex behavior when stores synchronized, but persists are not synchronized (see paper) Strand persistency New model to minimally constrain persists Precisely defines DAG of ordering constraints

Epoch persistency example Lock(volatile mutex) Memory barrier Persist data[0] Persist data[1] … Persist data[N] Persist barrier Persist flag Unlock(volatile mutex) Lock/Mutex synchronizes threads No need to enforce persist order Flag must not persist before data Already locked, no need to synchronize threads Stores reorder around persist barriers Persists reorder around store barriers Complicates store atomicity (see paper) Relaxed persistency appropriately orders memory events

Strand persistency precisely labels constraints Divide execution into strands Each strand is an independent set of persists All strands initially unordered Conflicting accesses (i.e., 2 accesses to address, at least 1 is store) establish persist order NewStrand label begins each strand Barriers continue to order persists within each strand as in epoch persistency Strand persistency precisely labels constraints

Strand persistency example ... Epoch Strand NewStrand A Barrier C B A B A Barrier B C A B Barrier C or C B must be ordered with A and/or C ... Strands remove unnecessary ordering constraints

Outline Define memory persistency Strict persistency and models Relaxed persistency and models Methodology and evaluation

Compare persist critical path against instruction execution rate Methodology µ-benchmark: concurrent, persistent queue See paper for pseudocode Implementations under strict, epoch, and strand persistency models (under SC) Measure native performance on real server (2.4Ghz Xeon) for 1 and 8 threads Measure persist concurrency via memory trace simulation Compare persist critical path against instruction execution rate

Relaxed persistency removes constraints, regains throughput Line = instruction execution rate Assumes 500ns persists Relaxed persistency removes constraints, regains throughput

Conclusion Must order persists, but over-constraining hurts performance (resembles consistency) Memory persistency builds on consistency to enforce persist order Persistency may be relaxed, de-coupling store and persist order constraints Relaxed persistency enables instruction execution rate with recovery correctness 30x speedup over strict persistency/SC

Thank You! Questions?

Persist latency sensitivity 1 Thread Relaxed persistency tolerates greater persist latency

Byte-addressable File System (BPFS) cache BPFS persistency model: Only order according to persistent conflicts Accesses to vol. address space do not order persists No load-before-store conflict order (TSO ordering) Newly introduced semantics: Consequences of simultaneously relaxing consistency and persistency Persist epoch races Volatile accesses synchronized; persists are not Atomic persists/persist coalescing

Steven Pelley, Peter M. Chen, Thomas F. Wenisch University of Michigan Memory Persistency Steven Pelley, Peter M. Chen, Thomas F. Wenisch University of Michigan

Memory Persistency: Consistency Models for NVRAM Writes unordered! Writes to memory unordered (cache eviction) But, recovery depends on write ordering Enforcing order for all writes too slow! Persistency models provide framework to reason about NVRAM write order while maximizing concurrency

Nonvolatile memory (NVRAM) DRAM and flash scaling slowing down New NVRAMs provide fast, scalable storage (phase change, memristor, STT-RAM) Storage technology Random read latency Durable? Disk 10ms  Flash 90µs DRAM 100ns  NVRAM 50-1000ns [IBM] Performance of DRAM, durability of disk