Relaxed Consistency Finale

Slides:

Advertisements

Similar presentations

Dataflow Analysis for Datarace-Free Programs (ESOP 11) Arnab De Joint work with Deepak DSouza and Rupesh Nasre Indian Institute of Science, Bangalore.

Advertisements

1 Episode III in our multiprocessing miniseries. Relaxed memory models. What I really wanted here was an elephant with sunglasses relaxing On a beach,

1 Lecture 20: Synchronization & Consistency Topics: synchronization, consistency models (Sections )

Threads Cannot be Implemented As a Library Andrew Hobbs.

Synchronization. How to synchronize processes? – Need to protect access to shared data to avoid problems like race conditions – Typical example: Updating.

Memory Consistency Models Kevin Boos. Two Papers Shared Memory Consistency Models: A Tutorial – Sarita V. Adve & Kourosh Gharachorloo – September 1995.

ECE 454 Computer Systems Programming Parallel Architectures and Performance Implications (II) Ding Yuan ECE Dept., University of Toronto

CS 162 Memory Consistency Models. Memory operations are reordered to improve performance Hardware (e.g., store buffer, reorder buffer) Compiler (e.g.,

D u k e S y s t e m s Time, clocks, and consistency and the JMM Jeff Chase Duke University.

Steven Pelley, Peter M. Chen, Thomas F. Wenisch University of Michigan

“THREADS CANNOT BE IMPLEMENTED AS A LIBRARY” HANS-J. BOEHM, HP LABS Presented by Seema Saijpaul CS-510.

By Sarita Adve & Kourosh Gharachorloo Review by Jim Larson Shared Memory Consistency Models: A Tutorial.

Formalisms and Verification for Transactional Memories Vasu Singh EPFL Switzerland.

Computer Architecture II 1 Computer architecture II Lecture 9.

1 Lecture 15: Consistency Models Topics: sequential consistency, requirements to implement sequential consistency, relaxed consistency models.

1 Sharing Objects – Ch. 3 Visibility What is the source of the issue? Volatile Dekker’s algorithm Publication and Escape Thread Confinement Immutability.

CS510 Concurrent Systems Class 5 Threads Cannot Be Implemented As a Library.

Memory Consistency Models Some material borrowed from Sarita Adve’s (UIUC) tutorial on memory consistency models.

Concurrency Recitation – 2/24 Nisarg Raval Slides by Prof. Landon Cox.

More on Locks: Case Studies

Evaluation of Memory Consistency Models in Titanium.

Shared Memory Consistency Models: A Tutorial Sarita V. Adve Kouroush Ghrachorloo Western Research Laboratory September 1995.

Shared Memory Consistency Models. Quiz (1)  Let’s define shared memory.

Foundations of the C++ Concurrency Memory Model Hans-J. Boehm Sarita V. Adve HP Laboratories UIUC.

By Sarita Adve & Kourosh Gharachorloo Slides by Jim Larson Shared Memory Consistency Models: A Tutorial.

11/18/20151 Operating Systems Design (CS 423) Elsa L Gunter 2112 SC, UIUC Based on slides by Roy Campbell, Sam.

Shared Memory Consistency Models. SMP systems support shared memory abstraction: all processors see the whole memory and can perform memory operations.

Anshul Kumar, CSE IITD CSL718 : Multiprocessors Synchronization, Memory Consistency 17th April, 2006.

Anshul Kumar, CSE IITD ECE729 : Advance Computer Architecture Lecture 26: Synchronization, Memory Consistency 25 th March, 2010.

Memory Consistency Models. Outline Review of multi-threaded program execution on uniprocessor Need for memory consistency models Sequential consistency.

Threads Cannot be Implemented as a Library Hans-J. Boehm.

Multiprocessor Cache Consistency (or, what does volatile mean?) Andrew Whitaker CSE451.

Threads cannot be implemented as a library Hans-J. Boehm (presented by Max W Schwarz)

Fundamentals of Parallel Computer Architecture - Chapter 71 Chapter 7 Introduction to Shared Memory Multiprocessors Yan Solihin Copyright.

ICFEM 2002, Shanghai Reasoning about Hardware and Software Memory Models Abhik Roychoudhury School of Computing National University of Singapore.

CS533 Concepts of Operating Systems Jonathan Walpole.

Specifying Multithreaded Java semantics for Program Verification Abhik Roychoudhury National University of Singapore (Joint work with Tulika Mitra)

Week 9, Class 3: Java’s Happens-Before Memory Model (Slides used and skipped in class) SE-2811 Slide design: Dr. Mark L. Hornick Content: Dr. Hornick Errors:

Ch4. Multiprocessors & Thread-Level Parallelism 4. Syn (Synchronization & Memory Consistency) ECE562 Advanced Computer Architecture Prof. Honggang Wang.

Week 8, Class 3: Model-View-Controller Final Project Worth 2 labs Cleanup of Ducks Reducing coupling Finishing FactoryMethod Cleanup of Singleton SE-2811.

December 1, 2006©2006 Craig Zilles1 Threads & Atomic Operations in Hardware  Previously, we introduced multi-core parallelism & cache coherence —Today.

Symmetric Multiprocessors: Synchronization and Sequential Consistency

An Operational Approach to Relaxed Memory Models

Advanced .NET Programming I 11th Lecture

Distributed Shared Memory

Background on the need for Synchronization

Memory Consistency Models

Threads Cannot Be Implemented As a Library

Atomic Operations in Hardware

Lecture 11: Consistency Models

Atomic Operations in Hardware

Memory Consistency Models

Specifying Multithreaded Java semantics for Program Verification

Threads and Memory Models Hal Perkins Autumn 2011

Symmetric Multiprocessors: Synchronization and Sequential Consistency

Symmetric Multiprocessors: Synchronization and Sequential Consistency

Threads and Memory Models Hal Perkins Autumn 2009

Shared Memory Consistency Models: A Tutorial

Implementing Mutual Exclusion

Memory Consistency Models

Kernel Synchronization II

Xinyu Feng University of Science and Technology of China

CSE 153 Design of Operating Systems Winter 19

CS333 Intro to Operating Systems

Relaxed Consistency Part 2

Why we have Counterintuitive Memory Models

Chapter 6: Synchronization Tools

Compilers, Languages, and Memory Models

CS 8803: Memory Models David Devecsery.

Problems with Locks Andrew Whitaker CSE451.

Presentation transcript:

Relaxed Consistency Finale David Devecsery

Administration! Syllabus quiz (last warning!) – On Canvas! Review 1 is graded. If you submitted but don’t have a grade, let me know If you got a comment, you should adjust future reviews If you didn’t get a comment, you had a fine review Sign-up for leading discussions coming soon!

Today: Higher-level discussion What do programmers need? What is unacceptable? What are the costs of providing these abstractions? Some more consistency background and semantics Can we do better than barriers? We “Frighten Small Children”

Finishing up DRF-0 discussion Advantages Disadvantages

Do you believe the DRF-0 argument? Is it really best to define memory models by what’s needed to achieve SC? Is there a better definition? Should all programs really be data-race free? There are legal programs with data-races Do we really need all of the performance that comes from as weak a memory model as possible? What’s the performance gain/cost of weaker architectures?

Partial consistency: locking Hint: Think of using the synchronization to lock another variable: lock() y = 1 unlock () Will this code enforce mutual exclusion on a PC MM? xchg (x, y): atomic { old = x; *x = y; return old } Lock: do { tmp = xchg(unlocked, 0); if (tmp != 1) { while (unlocked != 1) ; } } while (tmp != 1) Unlock unlocked = 1

Relaxed consistency: barriers X86 [X]Fences: All X instructions before the fence will be globally visible to all X instructions following the fence MFENCE (load or store) – Handout provided SFENCE (stores only) – Handout provided LFENCE (loads only) – Handout provided NOTE: x86 also has a LOCK prefix, it is needed and related to fences, but different

Partial consistency: barriers Can we use fences to make this mutually exclusive? xchg (x, y): atomic { old = x; x = y; return old } Lock: do { tmp = xchg(unlocked, 0); if (tmp != 1) { while (unlocked != 1) ; } } while (tmp != 1) MFENCE Unlock MFENCE unlocked = 1

In what ways can we reorder operations? Data Reordering Read-read Read-write Write-write Memory Atomicity When a store goes to memory, is it visible to all processors?

What reorderings should be allowed here? What are the highest and lowest points line 4 can be reorderd to? Thread 1 1. local1 = 10 2. lock() 3. shared1 = true 4. local2 = shared2; 5. unlock() 6. local3 = local1 7. local4 = local2 What are the highest and lowest points line 3 can be reorderd to? What about line 6? line 7?

Partial consistency: barriers Can we use fences to make this mutually exclusive? Is this an optimal lock implementation? xchg (x, y): atomic { old = x; x = y; return old } Lock: do { tmp = xchg(unlocked, 0); if (tmp != 1) { while (unlocked != 1) ; } } while (tmp != 1) MFENCE Unlock MFENCE unlocked = 1 What constraints do we need (instead of fence) to make it optimal?

Reordering limits Cannot order above Cannot order below Thread 1 1. local1 = 10 2. lock() 3. shared1 = true 4. local2 = shared2; 5. unlock() 6. local3 = local1 7. local4 = local2 Acquire semantics Cannot order above Release semantics Cannot order below

Partial consistency: barriers Can we use fences to make this mutually exclusive? Is this an optimal lock implementation? xchg (x, y): atomic { old = x; x = y; return old } Lock: do { tmp = xchg(unlocked, 0, AcquireSemantics); if (tmp != 1) { while (unlocked != 1) ; } } while (tmp != 1) Unlock Store(unlocked, 1, ReleaseSemantics); What constraints do we need (instead of fence) to make it optimal?

Shifting Gears: “Silently shifting semicolon” Authors argue that for “safe” languages (e.g. Java), the statement ordering abstraction is essential We shouldn’t try to create faster unsafe code, instead we should try to make safe code faster. Different than DRF-0 DRF-0 is fast-by-default, with a burdensome programmer interface They argue for safety-by-default, with unclear performance implications

Should we prioritize safety, or performance? Traditionally work has prioritized performance Is it clear that there is a significant performance cost to prioritizing safety? “Existence proof,” we program in this environment today, its fine Work prioritizing safety is promising But has gained little traction <30% overhead for SC on some server applications Some languages (e.g. python) enforce SC via uni-core execution

Linux Memory Model: Are you frightened, or just disconcerted? What does a world without DRF0 look like? What does it take to formally reason about all of these weak memory models? This paper explores the significant effort to reason about the linux memory model Case study on the RCU data-structure

What primitives exist? Reason about several primitives Translate into read ( R ) write ( W ), and fence ( F ) operations Specify operations, and data-flow rules in cat language Formal language, allows operations to be

Core model Rules for: Sequential consistency per variable Atomicity of read-modify-write Happens-before Propagates-before (Constrains propagation of writes across fences) This seems simple enough, but once they explain implementation, it gets complex quickly!

Many more details Did you get lost at some points? I got lost at some points. Formally modifying behaviors on general relaxed consistency processors in something as complex as the Linux kernel is a very complicated matter! Even for these small kernel algorithms evaluated by this paper!

Enter Conference slides Found at: http://www.rdrop.com/users/paulmck/scalability/pa per/slides-asplos18.pdf

This is so complex, is it useful? Even though this tool is so complicated, Linux developers find it useful! Paper cites mailing list entry in which dev references verified model of fix. Is it really reasonable to expect programmers to reason at this level?

Next class No class Monday, Sep. 3! Entering into the realm of compilers on Sep. 5 Enjoy Labor Day