Martin Vechev IBM Research Michael Kuperstein Technion Eran Yahav Technion (FMCAD’10, PLDI’11) 1.

Slides:



Advertisements
Similar presentations
Bounded Model Checking of Concurrent Data Types on Relaxed Memory Models: A Case Study Sebastian Burckhardt Rajeev Alur Milo M. K. Martin Department of.
Advertisements

Greta YorshEran YahavMartin Vechev IBM Research. { ……………… …… …………………. ……………………. ………………………… } P1() Challenge: Correct and Efficient Synchronization { ……………………………
Greta YorshEran YahavMartin Vechev IBM Research. { ……………… …… …………………. ……………………. ………………………… } T1() Challenge: Correct and Efficient Synchronization { ……………………………
CIS 540 Principles of Embedded Computation Spring Instructor: Rajeev Alur
Synchronization. How to synchronize processes? – Need to protect access to shared data to avoid problems like race conditions – Typical example: Updating.
Memory Consistency Models Kevin Boos. Two Papers Shared Memory Consistency Models: A Tutorial – Sarita V. Adve & Kourosh Gharachorloo – September 1995.
Ch 7 B.
CS 162 Memory Consistency Models. Memory operations are reordered to improve performance Hardware (e.g., store buffer, reorder buffer) Compiler (e.g.,
“FENDER” AUTOMATIC MEMORY FENCE INFERENCE Presented by Michael Kuperstein, Technion Joint work with Martin Vechev and Eran Yahav, IBM Research 1.
Ch. 7 Process Synchronization (1/2) I Background F Producer - Consumer process :  Compiler, Assembler, Loader, · · · · · · F Bounded buffer.
1 Eran Yahav Technion Joint work with Martin Vechev (ETH), Greta Yorsh (ARM), Michael Kuperstein (Technion), Veselin Raychev (ETH)
Silberschatz, Galvin and Gagne ©2007 Operating System Concepts with Java – 7 th Edition, Nov 15, 2006 Chapter 6 (a): Synchronization.
Chapter 6 Process Synchronization Bernard Chen Spring 2007.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 6: Process Synchronization.
Process Synchronization. Module 6: Process Synchronization Background The Critical-Section Problem Peterson’s Solution Synchronization Hardware Semaphores.
Mutual Exclusion.
CH7 discussion-review Mahmoud Alhabbash. Q1 What is a Race Condition? How could we prevent that? – Race condition is the situation where several processes.
PARTIAL-COHERENCE ABSTRACTIONS FOR RELAXED MEMORY MODELS Presented by Michael Kuperstein, Technion Joint work with Martin Vechev, IBM Research and Eran.
Steven Pelley, Peter M. Chen, Thomas F. Wenisch University of Michigan
CS444/CS544 Operating Systems Synchronization 2/16/2006 Prof. Searleman
Martin Vechev IBM Research Michael Kuperstein Technion Eran Yahav Technion (FMCAD’10, PLDI’11) 1.
Inferring Synchronization under Limited Observability Martin Vechev Eran Yahav Greta Yorsh IBM T.J. Watson Research Center.
1 Basic abstract interpretation theory. 2 The general idea §a semantics l any definition style, from a denotational definition to a detailed interpreter.
By Sarita Adve & Kourosh Gharachorloo Review by Jim Larson Shared Memory Consistency Models: A Tutorial.
Formalisms and Verification for Transactional Memories Vasu Singh EPFL Switzerland.
1 Lecture 09 – Synthesis of Synchronization Eran Yahav.
1 Martin Vechev IBM T.J. Watson Research Center Joint work with: Hagit Attiya, Rachid Guerraoui, Danny Hendler, Petr Kuznetsov, Maged Michael.
Computer Architecture II 1 Computer architecture II Lecture 9.
1 Sharing Objects – Ch. 3 Visibility What is the source of the issue? Volatile Dekker’s algorithm Publication and Escape Thread Confinement Immutability.
Synchronization (other solutions …). Announcements Assignment 2 is graded Project 1 is due today.
Memory Consistency Models Some material borrowed from Sarita Adve’s (UIUC) tutorial on memory consistency models.
Maria-Cristina Marinescu Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology A Synthesis Algorithm for Modular Design of.
1 Thread Synchronization: Too Much Milk. 2 Implementing Critical Sections in Software Hard The following example will demonstrate the difficulty of providing.
Shared Memory Consistency Models: A Tutorial Sarita V. Adve Kouroush Ghrachorloo Western Research Laboratory September 1995.
Condition Variables and Transactional Memory: Problem or Opportunity? Polina Dudnik and Michael Swift University of Wisconsin, Madison.
L AWS OF ORDER : EXPENSIVE SYNCHRONIZATION IN CONCURRENT ALGORITHMS CANNOT BE ELIMINATED POPL '11 Hagit Attiya, Rachid Guerraoui, Danny Hendler, Petr Kuznetsov,
Inferring Synchronization under Limited Observability Martin Vechev, Eran Yahav, Greta Yorsh IBM T.J. Watson Research Center (work in progress)
Shared Memory Consistency Models. SMP systems support shared memory abstraction: all processors see the whole memory and can perform memory operations.
Anshul Kumar, CSE IITD CSL718 : Multiprocessors Synchronization, Memory Consistency 17th April, 2006.
Fence Scoping Changhui Lin †, Vijay Nagarajan*, Rajiv Gupta † † University of California, Riverside * University of Edinburgh.
Memory Consistency Models. Outline Review of multi-threaded program execution on uniprocessor Need for memory consistency models Sequential consistency.
CY2003 Computer Systems Lecture 04 Interprocess Communication.
Pattern-based Synthesis of Synchronization for the C++ Memory Model Yuri Meshman, Noam Rinetzky, Eran Yahav 1.
Multiprocessor Cache Consistency (or, what does volatile mean?) Andrew Whitaker CSE451.
Operating Systems CMPSC 473 Mutual Exclusion Lecture 11: October 5, 2010 Instructor: Bhuvan Urgaonkar.
Complexity Implications of Memory Models. Out-of-Order Execution Avoid with fences (and atomic operations) Shared memory processes reordering buffer Hagit.
CSCI-375 Operating Systems Lecture Note: Many slides and/or pictures in the following are adapted from: slides ©2005 Silberschatz, Galvin, and Gagne Some.
ICFEM 2002, Shanghai Reasoning about Hardware and Software Memory Models Abhik Roychoudhury School of Computing National University of Singapore.
Getting Rid of Store-Buffers in TSO Analysis Mohamed Faouzi Atig Uppsala University, Sweden Ahmed Bouajjani LIAFA, University of Paris 7, France LIAFA,
CS533 Concepts of Operating Systems Jonathan Walpole.
CGS 3763 Operating Systems Concepts Spring 2013 Dan C. Marinescu Office: HEC 304 Office hours: M-Wd 11: :30 AM.
CISC 879 : Advanced Parallel Programming Rahul Deore Dept. of Computer & Information Sciences University of Delaware Exploring Memory Consistency for Massively-Threaded.
Abstractions for Relaxed Memory Models Andrei Dan, Yuri Meshman, Martin Vechev, Eran Yahav 1.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition Chapter 6: Process Synchronization.
740: Computer Architecture Memory Consistency Prof. Onur Mutlu Carnegie Mellon University.
An Operational Approach to Relaxed Memory Models
Memory Consistency Models
Memory Consistency Models
Chapter 5: Process Synchronization
Abstraction-Guided Synthesis
Shared Memory Consistency Models: A Tutorial
Module 7a: Classic Synchronization
Threads and Memory Models Hal Perkins Autumn 2009
Synthesis of Memory Fences via Refinement Propagation
Memory Consistency Models
CSE 153 Design of Operating Systems Winter 19
Relaxed Consistency Part 2
Relaxed Consistency Finale
Problems with Locks Andrew Whitaker CSE451.
Abstraction-Guided Synthesis of synchronization
Presentation transcript:

Martin Vechev IBM Research Michael Kuperstein Technion Eran Yahav Technion (FMCAD’10, PLDI’11) 1

Textbook Example: Peterson’s Algorithm Specification: mutual exclusion over critical section Process 0: while(true) { store ent0 = true; store turn = 1; do { load e = ent1; load t = turn; } while(e == true && t == 1); //Critical Section here store ent0 = false; } Process 1: while(true) { store ent1 = true; store turn = 0; do { load e = ent0; load t = turn; } while(e == true && t == 0); //Critical Section here store ent1 = false; } 2

Beyond Textbooks: Relaxed Memory Models Specification: mutual exclusion over critical section Process 0: while(true) { store ent0 = true; store turn = 1; do { load e = ent1; load t = turn; } while(e == true && t == 1); //Critical Section here store ent0 = false; } Process 1: while(true) { store ent1 = true; store turn = 0; do { load e = ent0; load t = turn; } while(e == true && t == 0); //Critical Section here store ent1 = false; }  Re-ordering of operations  Non-atomic stores 3

Memory Fences  Enforce order… at a cost  S1; fence; S2 S1 must become observable before S2 starts executing  Fences are expensive  10s-100s of cycles  collateral damage (e.g., prevent compiler opts)  example: removing a single fence yields 3x speedup in a work-stealing queue  Required fences depend on memory model  Where should I put fences? 4

May Seem Easy… Specification: mutual exclusion over critical section Process 0: while(true) { store ent0 = true; fence; store turn = 1; fence; do { load e = ent1; load t = turn; } while(e == true && t == 1); //Critical Section here store ent0 = false; } Process 1: while(true) { store ent1 = true; fence; store turn = 0; fence; do { load e = ent0; load t = turn; } while(e == true && t == 0); //Critical Section here store ent1 = false; } 5

Chase-Lev Work-Stealing Queue 1 int take() { 2 long b = bottom – 1; 3 item_t * q = wsq; 4 bottom = b 5 long t = top 6 if (b < t) { 7 bottom = t; 8 return EMPTY; 9 } 10 task = q->ap[b % q->size]; 11 if (b > t) 12 return task 13 if (!CAS(&top, t, t+1)) 14 return EMPTY; 15 bottom = t + 1; 16 return task; 17 } 1 void push(int task) { 2 long b = bottom; 3 long t = top; 4 item_t * q = wsq; 5 if (b – t >= q->size – 1) { 6 wsq = expand(); 7 q = wsq; 8 } 9 q->ap[b % q->size] = task; 10 bottom = b + 1; 11} 1 int steal() { 2 long t = top; 3 long b = bottom; 4 item_t * q = wsq; 5 if (t >= b) 6 return EMPTY; 7 task = q->ap[t % q->size]; 8 if (!CAS(&top, t, t+1)) 9 return ABORT; 10 return task; 11} Specification: no lost items, no phantom items, memory safety 6

Fender  Help the programmer place fences  Find optimal fence placement  Ideal setting for synthesis  Restrict non-determinism (re-ordering) s.t. program stays within set of safe executions  Trivial solution: fence after every store  Hard for a human even for small programs 7

Goal  P’ satisfies the specification S under M FENDER Program P Specification S Memory Model M Program P’ with Fences 8

Naïve Approach: Recipe 1. Compute reachable states for the program 2. Compute constraints on execution that guarantee that all “bad states” are avoided 3. Implement the constraints with fences 9 Bad News [Atig et. al POPL’10] Reachability undecidable for RMO Non-primitive recursive complexity for TSO/PSO (Even for programs that are finite-state under SC) (solving a safety game)

Store Buffers 10 … P0 Main Memory … P1 … … … … ent0 ent1 turn ent0 ent1 turn storeflush load fence

Unbounded store buffers 11 P0 Main Memory P1 ent0 ent1 turn ent0 ent1 turn Process 0: while(true) { store ent0 = true; store turn = 1; do { load e = ent1; load t = turn; } while(e == true && t == 1); //Critical Section here store ent0 = false; } TFTF …

What can we do?  Under-approximate store buffers  Bound the length of execution buffers [KVY-FMCAD’10]  Bound context switches [Bouajjani et al.]  Dynamic synthesis of fences (In progress)  Over-approximate store buffers  Sound abstraction of buffer content [KVY-PLDI’11] 12

Abstract Interpretation for RMM 13  Idea: Bounded over-approximation of unbounded buffers  Hierarchy of abstract memory models  Strictly weaker than concrete TSO/PSO  Maintain partial-coherence

First Attempt: Set Abstraction 14 Abstract each store buffer as a set Process 0: while(true) { store ent0 = true; fence; store turn = 1; fence; do { load e = ent1; load t = turn; } while(e == true && t == 1); //Critical Section here store ent0 = false; } Process 1: while(true) { store ent1 = true; fence; store turn = 0; fence; do { load e = ent0; load t = turn; } while(e == true && t == 0); //Critical Section here store ent1 = false; } … P0 Main Memory … P1 … … … … ent0 ent1 turn ent0 ent1 turn { } {1} {true} { } { false } { false, true } { } Represents B(ent0) = true; false B(ent0) = false; true …

Abstract Memory Models - Requirements  Intra-process coherence  a process should see the most recent value it wrote  Preserve fence semantics  The value written to main memory when flushed by a fence is the most recent value stored before the fence  Partial inter-process coherence  preserve as much order information as feasible (bounded)  Enable strong flushes  Sound  Simple construction! 15

Partial Coherence Abstractions 16 … P0 Main Memory … P1 … … … … ent0 ent1 turn ent0 ent1 turn P0 Main Memory P1 ent0 turn ent0 ent1 turn Recent value Bounded length k Unordered elements ent1 Allows precise fence semantics Allows precise loads from buffer Keeps the analysis precise for “well behaved” programs Record what values appeared (without order or number)

Why does it work?  Well behaved programs have structure  Stores that are important to communicate with other processors are fenced close to the store  Making unbounded number of writes to a memory location without a flush?  Seems esoteric  Your program is suspicious 17 flag := true while other_flag = true { flag := false //Do something flag := true }

Adjusted Recipe 18  Compute (abstract) reachable states for the program using partial-coherence abstraction  Compute constraints on execution that guarantee that all “bad states” are avoided  Implement the constraints with fences (solving a safety game with abstraction)

Precision/fences tradeoff  Different abstractions lead to different fences  Guaranteed to be sound – never miss a fence  More precise abstraction – potentially fewer fences  In the paper  SC-Recoverability  PSO as an abstraction of TSO  Partially disjunctive abstraction allows more aggressive merging of buffer, scales better 19

Synthesis Results 20  Mostly mutual exclusion primitives  Different variations of the abstraction ProgramFD k=0FD k=1PD k=0PD k=1PD k=2 Sense0   Pet0  Dek0  Lam0T/O  Fast0T/O  Fast1a   Fast1b    Fast1cT/O  

Limitations… 21  More relaxed memory models  Need reasonable concrete semantics first  What if the program is infinite-space?  Or simply large  Partial-coherence isn’t enough  We want to also use “standard” abstract interpretation

Composing Abstractions 22  Trivial for value abstractions  Abstract buffers of abstract values  Challenging for more complex abstractions  Heap abstractions  Predicate abstractions  Buffers for “abstract locations” (?)

Summary  Partial-coherence abstractions  Verification without context bounds  but possibly with false alarms  Synthesis of fences  Precision affects quality of results  In progress  Combination of abstractions  Scalable (dynamic) fence inference  Other synchronization mechanisms  CCRs [TACAS’09]  Atomic sections [POPL’10] 23

Invited Questions 1. Do you ever need to use k>1 ? 2. In your abstraction, do you ever fall into the unordered set? 3. How far can we expect this to scale? 4. Why over-approximation for fence inference? 5. Can you explain how the inference works? 24

25

26