Compilers, Languages, and Memory Models

Slides:



Advertisements
Similar presentations
Dataflow Analysis for Datarace-Free Programs (ESOP 11) Arnab De Joint work with Deepak DSouza and Rupesh Nasre Indian Institute of Science, Bangalore.
Advertisements

1 Episode III in our multiprocessing miniseries. Relaxed memory models. What I really wanted here was an elephant with sunglasses relaxing On a beach,
Threads Cannot be Implemented As a Library Andrew Hobbs.
50.003: Elements of Software Construction Week 6 Thread Safety and Synchronization.
Memory Consistency Models Kevin Boos. Two Papers Shared Memory Consistency Models: A Tutorial – Sarita V. Adve & Kourosh Gharachorloo – September 1995.
Memory Models (1) Xinyu Feng University of Science and Technology of China.
CS 162 Memory Consistency Models. Memory operations are reordered to improve performance Hardware (e.g., store buffer, reorder buffer) Compiler (e.g.,
© Krste Asanovic, 2014CS252, Spring 2014, Lecture 12 CS252 Graduate Computer Architecture Spring 2014 Lecture 12: Synchronization and Memory Models Krste.
CS492B Analysis of Concurrent Programs Consistency Jaehyuk Huh Computer Science, KAIST Part of slides are based on CS:App from CMU.
CSE , Autumn 2011 Michael Bond.  Name  Program & year  Where are you coming from?  Research interests  Or what’s something you find interesting?
“THREADS CANNOT BE IMPLEMENTED AS A LIBRARY” HANS-J. BOEHM, HP LABS Presented by Seema Saijpaul CS-510.
Formalisms and Verification for Transactional Memories Vasu Singh EPFL Switzerland.
Computer Architecture II 1 Computer architecture II Lecture 9.
1 Sharing Objects – Ch. 3 Visibility What is the source of the issue? Volatile Dekker’s algorithm Publication and Escape Thread Confinement Immutability.
Memory Model Safety of Programs Sebastian Burckhardt Madanlal Musuvathi Microsoft Research EC^2, July 7, 2008.
CS510 Concurrent Systems Class 5 Threads Cannot Be Implemented As a Library.
Memory Consistency Models Some material borrowed from Sarita Adve’s (UIUC) tutorial on memory consistency models.
Evaluation of Memory Consistency Models in Titanium.
CS510 Concurrent Systems Introduction to Concurrency.
Shared Memory Consistency Models: A Tutorial Sarita V. Adve Kouroush Ghrachorloo Western Research Laboratory September 1995.
CDP 2012 Based on “C++ Concurrency In Action” by Anthony Williams and The C++11 Memory Model and GCC WikiThe C++11 Memory Model and GCC Created by Eran.
CDP 2013 Based on “C++ Concurrency In Action” by Anthony Williams, The C++11 Memory Model and GCCThe C++11 Memory Model and GCC Wiki and Herb Sutter’s.
Foundations of the C++ Concurrency Memory Model Hans-J. Boehm Sarita V. Adve HP Laboratories UIUC.
Aritra Sengupta, Swarnendu Biswas, Minjia Zhang, Michael D. Bond and Milind Kulkarni ASPLOS 2015, ISTANBUL, TURKEY Hybrid Static-Dynamic Analysis for Statically.
Practical Reduction for Store Buffers Ernie Cohen, Microsoft Norbert Schirmer, DFKI.
Shared Memory Consistency Models. SMP systems support shared memory abstraction: all processors see the whole memory and can perform memory operations.
Memory Consistency Models. Outline Review of multi-threaded program execution on uniprocessor Need for memory consistency models Sequential consistency.
Data races, informally [More formal definition to follow] “race condition” means two different things Data race: Two threads read/write, write/read, or.
Multiprocessor Cache Consistency (or, what does volatile mean?) Andrew Whitaker CSE451.
Threads cannot be implemented as a library Hans-J. Boehm (presented by Max W Schwarz)
ICFEM 2002, Shanghai Reasoning about Hardware and Software Memory Models Abhik Roychoudhury School of Computing National University of Singapore.
A Safety-First Approach to Memory Models Madan Musuvathi Microsoft Research ISMM ‘13 Keynote 1.
Specifying Multithreaded Java semantics for Program Verification Abhik Roychoudhury National University of Singapore (Joint work with Tulika Mitra)
Week 9, Class 3: Java’s Happens-Before Memory Model (Slides used and skipped in class) SE-2811 Slide design: Dr. Mark L. Hornick Content: Dr. Hornick Errors:
The C++11 Memory Model CDP Based on “C++ Concurrency In Action” by Anthony Williams, The C++11 Memory Model and GCCThe C++11 Memory Model and GCC Wiki.
CS510 Concurrent Systems Jonathan Walpole. Introduction to Concurrency.
Java Thread Programming
Prescient Memory: Exposing Weak Memory Model Behavior by Looking into the Future MAN CAO JAKE ROEMER ARITRA SENGUPTA MICHAEL D. BOND 1.
Symmetric Multiprocessors: Synchronization and Sequential Consistency
Lecture 20: Consistency Models, TM
Outline CPU caches Cache coherence Placement of data
An Operational Approach to Relaxed Memory Models
Software Perspectives on Transactional Memory
Memory Consistency Models
Threads Cannot Be Implemented As a Library
Atomic Operations in Hardware
Memory Consistency Models
Specifying Multithreaded Java semantics for Program Verification
Threads and Memory Models Hal Perkins Autumn 2011
Symmetric Multiprocessors: Synchronization and Sequential Consistency
The C++ Memory model Implementing synchronization)
Symmetric Multiprocessors: Synchronization and Sequential Consistency
Implementing synchronization
Threads and Memory Models Hal Perkins Autumn 2009
Multiprocessor Highlights
Lecture 22: Consistency Models, TM
Background for Debate on Memory Consistency Models
Memory Consistency Models
CSE 451: Operating Systems Autumn 2003 Lecture 7 Synchronization
CSE 451: Operating Systems Autumn 2005 Lecture 7 Synchronization
CSE 451: Operating Systems Winter 2003 Lecture 7 Synchronization
Xinyu Feng University of Science and Technology of China
CSE 153 Design of Operating Systems Winter 19
Relaxed Consistency Part 2
Why we have Counterintuitive Memory Models
Chapter 6: Synchronization Tools
Relaxed Consistency Finale
Programming with Shared Memory Specifying parallelism
CS 8803: Memory Models David Devecsery.
Problems with Locks Andrew Whitaker CSE451.
Presentation transcript:

Compilers, Languages, and Memory Models David Devecsery

Administration! Readings! Sign-up for discussions Most are up! (some still subject to change) Sign-up for discussions On Canvas! Project Proposal info coming soon!

Today: Compilers! Compilers are like hardware They do things we don’t expect! What limitations are there to what a compiler can do? What memory model should a compiler enforce? C++ Memory Model studies

Why do compilers impact MMs? How can the compiler transform this code? Initial: ready = false, a = null Operation splitting Thread 1 a = new int[10]; ready = true; Thread 2 if (ready) a[5] = 5; Operation reordering Operation combining

Why do compilers impact MMs? What can’t the compiler do? Break single-threaded semantics Initial: ready = false, a = null Thread 1 a = new int[10]; ready = true; Thread 2 if (ready) a[5] = 5; Order across synchronizations Break control dependencies

Why do compilers impact MMs? Is this a legal transformation? Initial: ready = false, a = null Thread 1 ready = true; a = new int[10]; Thread 2 if (ready) a[5] = 5; Thread 1 a = new int[10]; ready = true; Thread 2 if (ready) a[5] = 5;

Why do compilers impact MMs? Is this a legal transformation? Initial: ready = false, a = null Thread 1 tmp = new int[10] a = tmp & 0xFF; ready = true a |= tmp & ~0xFF; Thread 2 if (ready) a[5] = 5; Thread 1 a = new int[10]; ready = true; Thread 2 if (ready) a[5] = 5;

Why do compilers impact MMs? Is this a legal compiler transformation? Why or why not? Initial: ready = false, a = null Thread 1 b = a +1 z = b + 10 c = a +1 w = c[1] Thread 2 a = new int[10] Thread 1 b = a +1 z = b + 10 w = b[1] Thread 2 a = new int[10]

Why do compilers impact MMs? This transformation breaks single-threaded semantics, loading a[5] may segfault if ready is not true Is this a legal transformation? Initial: ready = false, a = null Thread 1 a = new int[10] ready = true Thread 2 int *tmp = a[5] if (ready) *tmp = 5; Thread 1 a = new int[10]; ready = true; Thread 2 if (ready) a[5] = 5;

Why can hardware reorder across control dependencies? Hardware has full dynamic information Can efficiently mis-speculate Compiler reorderings are without dynamic information Compilers Hardware Instruction Window More Less Dynamic Information Less More

What restrictions do we need for DRF-0? What are the limitations on the compiler to enforce DRF0? Hint: Think of the hardware restrictions Any optimizations must preserve the happens-before relationship created by the original program.

Why do compilers impact MMs? Is this valid in a DRF0 compiler? Initial: ready = false, a = null There are no DRF0 defined happens-before orderings between the threads! Thread 1 ready = true; a = new int[10]; Thread 2 if (ready) a[5] = 5; Thread 1 a = new int[10]; ready = true; Thread 2 if (ready) a[5] = 5;

Why do compilers impact MMs? Is this valid in a DRF0 compiler? Initial: ready = false, a = null Reordering within the lock is valid, these instructions don’t have HB orderings w/ other threads Thread 1 lock(m1) ready = true; a = new int[10]; unlock(m1); Thread 2 lock(m1) if (ready) a[5] = 5; unlock(m1) Thread 1 lock(m1); a = new int[10]; ready = true; unlock(m1) Thread 2 lock(m1); if (ready) a[5] = 5; unlock(m1)

Is DRF-0 enough? What happens when DRF0 breaks? What are some negative consequences that can happen? e.g. Crash bugs! (below) Thread 1 ready = true; a = new int[10]; Thread 2 if (ready) a[5] = 5; Thread 1 a = new int[10]; ready = true; Thread 2 if (ready) a[5] = 5;

Safety vulnerabilities? Can DRF0 lead to safety concerns? Initial: foo = 2 Thread 1 tmp = table[foo] loc = tmp goto loc; Thread 2 foo = 1000001 foo -= 1000000 Thread 1 x = foo; switch(x) { case 1: … case 2: … … case n: … } Thread 2 foo = 1;

What about if we want racy programs? Sometimes, we intentionally write racy programs: Initial: foo = null if (foo == null) lock(m1); if (foo == null) { foo = new Foo(); } unlock(m1);

What about if we want racy programs? Is this safe? Sometimes, we intentionally write racy programs: Initial: foo = null Thread 1 if (foo == null) lock(m1); if (foo == null) { foo = malloc(sizeof(Foo)); foo.construct(); } unlock(m1); Thread 2 if (foo == null) { } … use uninitialized Foo “Double Check Locking” if (foo == null) lock(m1); if (foo == null) { foo = new Foo(); } unlock(m1);

How can we safely write racy code? C++11 has Weak DRF0 (or WDRF0) Allows defined atomic operations, with relaxed consistency semantics Acquire Release AcquireRelease Sequential Consistency Relaxed (no ordering restrictions)

Why do compilers impact MMs? No! This is impossible! The reordering breaks the happens-before ordering between the store and release. Can the following segfault in C++11? Initial: ready = false, a = null Thread 1 store(ready, true, ReleaseSemantics) a = new int[10]; Thread 2 if (load(ready, AcquireSemantics)) a[5] = 5; Thread 1 a = new int[10]; store(ready, true, ReleaseSemantics) Thread 2 if (load(ready, AcquireSemantics)) a[5] = 5;

Atomics are simple, right? Can we see a==1, b==0, c==1, d==0 in C++11? Initial: x = 0, y = 0 Thread 1 store(x, 1, Release) Thread 2 store(y, 1, Release) Thread 3 a = load(x, Acquire) b = load(y, Thread 4 c = load(y, Acquire) d = load(x,

Atomics are simple, right? Yes! There is no relationship between thread1 and thread2, so the stores are not globally ordered. (Think of our TSO example) Can we see a==1, b==0, c==1, d==0 in C++11? Initial: x = 0, y = 0 Thread 1 store(x, 1, Release) Thread 2 store(y, 1, Release) Thread 3 a = load(x, Acquire) b = load(y, Thread 4 c = load(y, Acquire) d = load(x,

Why do compilers impact MMs? Can we see a==1, b==0, c==1, d==0 in C++11? Messy, but: No broken dependencies! Initial: y = 0, x = 0 Thread 1 store(x, 1, Release) Thread 2 store(y, 1, Release) Thread 3 a = load(x, Acquire) b = load(y, Thread 4 c = load(y, Acquire) d = load(x,

Why do compilers impact MMs? No! The request for an SC operation enforces ordering between thread1 and thread2 Can we see a==1, b==0, c==1, d==0 in C++11? Initial: x = 0, y = 0 Thread 1 store(x, 1, SC) Thread 2 store(y, 1, SC) Thread 3 a = load(x, Acquire) b = load(y, Thread 4 c = load(y, Acquire) d = load(x,

Why do compilers impact MMs? Can we see a==1, b==0, c==1, d==0 in C++11? Initial: x = 0, y = 0 Thread 1 store(x, 1, SC) Thread 2 store(y, 1, SC) Thread 3 a = load(x, Acquire) b = load(y, Thread 4 c = load(y, Acquire) d = load(x,

Why do compilers impact MMs? Can you spot the problem with these dependencies? Can we see a==1, b==0, c==1, d==0 in C++11? Initial: y = 0, x = 0 Thread 1 store(x, 1, SC) Thread 2 store(y, 1, SC) Thread 3 a = load(x, Acquire) b = load(y, Thread 4 c = load(y, Acquire) d = load(x,

Why do compilers impact MMs? Can you spot the problem with these dependencies? Can we see a==1, b==0, c==1, d==0 in C++11? Initial: y = 0, x = 0 Thread 1 store(x, 1, SC) Thread 2 store(y, 1, SC) Thread 3 a = load(x, Acquire) b = load(y, Thread 4 c = load(y, Acquire) d = load(x,

Weak DRF-0 Is Weak DRF0 really needed? What does the paper say? What do you think? Do you think this abstraction is programmable?

C++11 and “volatile” What is the use of volatile? Essentially assumes the value may change outside of the thread Disables reordering, splitting, and value caching optimizations Would making “a” and/or “ready” volatile help this code? Thread 1 a = new int[10]; ready = true; Thread 2 if (ready) a[5] = 5;

Next class Sequential Consistency and compilers! Java Memory Model