“FENDER” AUTOMATIC MEMORY FENCE INFERENCE Presented by Michael Kuperstein, Technion Joint work with Martin Vechev and Eran Yahav, IBM Research 1.

Slides:



Advertisements
Similar presentations
Bounded Model Checking of Concurrent Data Types on Relaxed Memory Models: A Case Study Sebastian Burckhardt Rajeev Alur Milo M. K. Martin Department of.
Advertisements

Automated Theorem Proving Lecture 1. Program verification is undecidable! Given program P and specification S, does P satisfy S?
Symmetric Multiprocessors: Synchronization and Sequential Consistency.
IBM T. J. Watson Research Center Conditions for Strong Synchronization Maged Michael IBM T J Watson Research Center Joint work with: Martin Vechev, Hagit.
Operating Systems Part III: Process Management (Process Synchronization)
Greta YorshEran YahavMartin Vechev IBM Research. { ……………… …… …………………. ……………………. ………………………… } P1() Challenge: Correct and Efficient Synchronization { ……………………………
Greta YorshEran YahavMartin Vechev IBM Research. { ……………… …… …………………. ……………………. ………………………… } T1() Challenge: Correct and Efficient Synchronization { ……………………………
CIS 540 Principles of Embedded Computation Spring Instructor: Rajeev Alur
Memory Consistency Models Kevin Boos. Two Papers Shared Memory Consistency Models: A Tutorial – Sarita V. Adve & Kourosh Gharachorloo – September 1995.
Concurrency: Mutual Exclusion and Synchronization Chapter 5.
Automatic Verification Book: Chapter 6. What is verification? Traditionally, verification means proof of correctness automatic: model checking deductive:
Compilation 2011 Static Analysis Johnni Winther Michael I. Schwartzbach Aarhus University.
Architecture-aware Analysis of Concurrent Software Rajeev Alur University of Pennsylvania Amir Pnueli Memorial Symposium New York University, May 2010.
Ch 7 B.
CS 162 Memory Consistency Models. Memory operations are reordered to improve performance Hardware (e.g., store buffer, reorder buffer) Compiler (e.g.,
Ch. 7 Process Synchronization (1/2) I Background F Producer - Consumer process :  Compiler, Assembler, Loader, · · · · · · F Bounded buffer.
1 Eran Yahav Technion Joint work with Martin Vechev (ETH), Greta Yorsh (ARM), Michael Kuperstein (Technion), Veselin Raychev (ETH)
Timed Automata.
Chapter 6: Process Synchronization
Background Concurrent access to shared data can lead to inconsistencies Maintaining data consistency among cooperating processes is critical What is wrong.
5.1 Silberschatz, Galvin and Gagne ©2009 Operating System Concepts with Java – 8 th Edition Chapter 5: CPU Scheduling.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 6: Process Synchronization.
Process Synchronization. Module 6: Process Synchronization Background The Critical-Section Problem Peterson’s Solution Synchronization Hardware Semaphores.
CH7 discussion-review Mahmoud Alhabbash. Q1 What is a Race Condition? How could we prevent that? – Race condition is the situation where several processes.
PARTIAL-COHERENCE ABSTRACTIONS FOR RELAXED MEMORY MODELS Presented by Michael Kuperstein, Technion Joint work with Martin Vechev, IBM Research and Eran.
Martin Vechev IBM Research Michael Kuperstein Technion Eran Yahav Technion (FMCAD’10, PLDI’11) 1.
Martin Vechev IBM Research Michael Kuperstein Technion Eran Yahav Technion (FMCAD’10, PLDI’11) 1.
By Sarita Adve & Kourosh Gharachorloo Review by Jim Larson Shared Memory Consistency Models: A Tutorial.
1 Martin Vechev IBM T.J. Watson Research Center Joint work with: Hagit Attiya, Rachid Guerraoui, Danny Hendler, Petr Kuznetsov, Maged Michael.
1 Sharing Objects – Ch. 3 Visibility What is the source of the issue? Volatile Dekker’s algorithm Publication and Escape Thread Confinement Immutability.
Race Conditions CS550 Operating Systems. Review So far, we have discussed Processes and Threads and talked about multithreading and MPI processes by example.
Comparison Under Abstraction for Verifying Linearizability Daphna Amit Noam Rinetzky Mooly Sagiv Tom RepsEran Yahav Tel Aviv UniversityUniversity of Wisconsin.
Instructor: Umar KalimNUST Institute of Information Technology Operating Systems Process Synchronization.
Operating Systems CSE 411 CPU Management Oct Lecture 13 Instructor: Bhuvan Urgaonkar.
Memory Consistency Models Some material borrowed from Sarita Adve’s (UIUC) tutorial on memory consistency models.
Maria-Cristina Marinescu Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology A Synthesis Algorithm for Modular Design of.
Shared Memory Consistency Models: A Tutorial Sarita V. Adve Kouroush Ghrachorloo Western Research Laboratory September 1995.
Martin Vechev Eran Yahav Greta Yorsh IBM T.J. Watson Research Center.
L AWS OF ORDER : EXPENSIVE SYNCHRONIZATION IN CONCURRENT ALGORITHMS CANNOT BE ELIMINATED POPL '11 Hagit Attiya, Rachid Guerraoui, Danny Hendler, Petr Kuznetsov,
Fence Scoping Changhui Lin †, Vijay Nagarajan*, Rajiv Gupta † † University of California, Riverside * University of Edinburgh.
Memory Consistency Models. Outline Review of multi-threaded program execution on uniprocessor Need for memory consistency models Sequential consistency.
Pattern-based Synthesis of Synchronization for the C++ Memory Model Yuri Meshman, Noam Rinetzky, Eran Yahav 1.
Operating Systems CMPSC 473 Mutual Exclusion Lecture 11: October 5, 2010 Instructor: Bhuvan Urgaonkar.
Complexity Implications of Memory Models. Out-of-Order Execution Avoid with fences (and atomic operations) Shared memory processes reordering buffer Hagit.
Operating Systems CSE 411 CPU Management Dec Lecture Instructor: Bhuvan Urgaonkar.
CS533 Concepts of Operating Systems Jonathan Walpole.
CGS 3763 Operating Systems Concepts Spring 2013 Dan C. Marinescu Office: HEC 304 Office hours: M-Wd 11: :30 AM.
Abstractions for Relaxed Memory Models Andrei Dan, Yuri Meshman, Martin Vechev, Eran Yahav 1.
Operating System Concepts and Techniques Lecture 13 Interprocess communication-2 M. Naghibzadeh Reference M. Naghibzadeh, Operating System Concepts and.
1 Critical Section Problem CIS 450 Winter 2003 Professor Jinhua Guo.
Process Synchronization Presentation 2 Group A4: Sean Hudson, Syeda Taib, Manasi Kapadia.
Symmetric Multiprocessors: Synchronization and Sequential Consistency
Memory Consistency Models
Memory Consistency Models
Chapter 5: Process Synchronization
Threads and Memory Models Hal Perkins Autumn 2011
Symmetric Multiprocessors: Synchronization and Sequential Consistency
Symmetric Multiprocessors: Synchronization and Sequential Consistency
Module 7a: Classic Synchronization
Threads and Memory Models Hal Perkins Autumn 2009
Background and Motivation
Grades.
Shared Memory Consistency Models: A Tutorial
Chapter 6: Process Synchronization
Synthesis of Memory Fences via Refinement Propagation
Memory Consistency Models
CSE 153 Design of Operating Systems Winter 19
Relaxed Consistency Part 2
Chapter 6: Synchronization Tools
Abstraction-Guided Synthesis of synchronization
Presentation transcript:

“FENDER” AUTOMATIC MEMORY FENCE INFERENCE Presented by Michael Kuperstein, Technion Joint work with Martin Vechev and Eran Yahav, IBM Research 1

p0: flag[0] := true while flag[1] = true { if turn ≠ 0 { flag[0] := false while turn ≠ 0 { } flag[0] := true } // critical section turn := 1 flag[0] := false p1: flag[1] := true while flag[0] = true { if turn ≠ 1 { flag[1] := false while turn ≠ 1 { } flag[1] := true } // critical section turn := 0 flag[1] := false Specification: mutual exclusion over critical section Dekker’s Algorithm 2

p0: flag[0] := true while flag[1] = true { if turn ≠ 0 { flag[0] := false while turn ≠ 0 { } flag[0] := true } // critical section turn := 1 flag[0] := false p1: flag[1] := true while flag[0] = true { if turn ≠ 1 { flag[1] := false while turn ≠ 1 { } flag[1] := true } // critical section turn := 0 flag[1] := false Beyond Textbooks: Weak Memory Models  Re-ordering of operations  Non-atomic stores 3

Memory Fences  Enforce order, at a cost!  Fences are expensive  10s-100s of cycles  Example: removing a single fence yields 3x speedup in a work-stealing queue [Michael, et al. PPoPP ’09]  Where should we put fences?  Required fences depend on memory model  Different kinds of fences 4

Goal  “Correct and efficient fencing for the masses”  A tool to help the programmer place fences  For non-trivial finite-state programs  Under a realistic memory model  Safe  Efficient 5

Easy! p0: flag[0] := true fence while flag[1] = true { if turn ≠ 0 { flag[0] := false while turn ≠ 0 { } flag[0] := true } // critical section turn := 1 flag[0] := false p1: flag[1] := true fence while flag[0] = true { if turn ≠ 1 { flag[1] := false while turn ≠ 1 { } flag[1] := true } // critical section turn := 0 flag[1] := false 6

Chase-Lev Work-Stealing Queue 1 int take() { 2 long b = bottom – 1; 3 item_t * q = wsq; 4 bottom = b 5 long t = top 6 if (b < t) { 7 bottom = t; 8 return EMPTY; 9 } 10 task = q->ap[b % q->size]; 11 if (b > t) 12 return task 13 if (!CAS(&top, t, t+1)) 14 return EMPTY; 15 bottom = t + 1; 16 return task; 17 } 1 void push(int task) { 2 long b = bottom; 3 long t = top; 4 item_t * q = wsq; 5 if (b – t >= q->size – 1) { 6 wsq = expand(); 7 q = wsq; 8 } 9 q->ap[b % q->size] = task; 10 bottom = b + 1; 11} 1 int steal() { 2 long t = top; 3 long b = bottom; 4 item_t * q = wsq; 5 if (t >= b) 6 return EMPTY; 7 task = q->ap[t % q->size]; 8 if (!CAS(&top, t, t+1)) 9 return ABORT; 10 return task; 11} 7

In Practice - Hard  This is a real problem  Finding the best placement for fences is hard  Classical trade-off: correctness vs. efficiency  Existing tools are insufficient  CheckFence [Alur et al. PLDI ’07] 8

Our Approach: Overview  P’ satisfies the specification S under M (Finite-State) Program P (Finite-State) Program P (Safety) Specification S Memory Model M Memory Model M Program P’ with Fences 9

Our Approach: Recipe  Compute reachable states for the program  Bad news: Reachability problem undecidable even for finite-state programs running under sufficiently weak MM [Atig et al. POPL ’10] So sometimes use an additional bound  Compute constraints that guarantee that all “bad states” are avoided  The constraints restrict non-determinism allowed by the memory model  Implement the constraints with fences 10

Our Approach: Ingredients  Operational semantics for weak memory models  An algorithm for finding order constraints  An algorithm for implementing constraints as fences in the program 11

Classification due to Adve et al. IEEE Computer ‘95 12 Operational Semantics for WMM

 Model store buffers  Model instruction reordering (execution buffers)  Variety of re-ordering rules 13

States and Transitions Processor B: B 1 : R2 = Y B 2 : R1 = X Processor A: A 1 : X = 1 A 2 : Y = 1 Initially X = Y = R1 = R2 = 0 A 2 :Y = 1 A 1 :X = 1 B 2 :R1 = X B 1 :R2 = Y X = 0 Y = 0 R1 = 0 R2 = 0 A 1 :X = 1B 2 :R1 = X B 1 :R2 = Y X = 0 Y = 1 R1 = 0 R2 = 0 A2A2 14

Compute Reachable States (0,0,0,0) (1,0,0,0)(0,1,0,0)(0,0,0,0) (0,1,0,0)(0,1,0,1)(0,1,0,0) (1,1,0,1)(0,1,0,1) (1,1,1,1)(1,1,0,1) A1A1 A2A2 B2B2 B1B1 A1A1 A1A1 A1A1 B1B1 B2B2 B2B2 B2B2 A2A2 A2A2 Error state (x,y,r1,r2) EB1 EB2 legend Specification at final state ¬ (R1 = 0  R2 = 1) initial 15 A1 A2 B1 B2 A2 B1 B2A1 B1 B2 A1 A2B2 A1 A2B1 A1B2A1B2 B1 B2 A1B2 B 1 : R2 = Y B 2 : R1 = X A 1 : X = 1 A 2 : Y = 1

Avoiding states  To avoid a state  Avoid all incoming transitions  To avoid an incoming transition  Either avoid the transition itself  Or avoid the source state 16

Avoidable Transitions  Execution buffer is ordered  A transition not executing first instruction in the execution buffer can be avoided  By forcing a different transition to execute A 4 :W = 1 A 3 :Z = 1 A 2 :Y = 1 A 1 :X = 1 Processor A A 1 : X = 1 A 2 : Y = 1 A 3 : Z = 1 A 4 : W = 1 A1A1 A2A2 A3A3 A4A4 17

Avoidable Transitions  To avoid A 3 in this state  Force A 1 to execute before A 3  Or force A 2 to execute before A 3  Language of ordering constraints  [A 1 < A 3 ]  [A 2 < A 3 ] A 4 :W = 1 A 3 :Z = 1 A 2 :Y = 1 A 1 :X = 1 Processor A A 1 : X = 1 A 2 : Y = 1 A 3 : Z = 1 A 4 : W = 1 A1A1 A2A2 18 A3A3 A4A4

Computing Avoid Formulae  Ordering constraint  [l 1 < l 2 ]  l 2 may not be reordered with l 1  Associate a propositional variable with each constraint  “Avoid formulas” are (positive) propositional formulas over ordering constraints  Fixed-point computation computes an avoid formula for every state  Final constraint formula is the conjunction of avoiding all “bad states” 19

Back to our example (0,0,0,0) false A1A1 (0,1,0,0) A1 < A2 (0,0,0,0) B1 < B2 (1,0,0,0) B1 < B2 (0,1,0,0) A1 < A2 || B1 < B2 (0,1,0,1) A1 < A2 (1,1,0,0) A1 < A2 (1,1,0,0) B1 < B2 (1,1,0,1) [] A1 < A2 && B1 < B2 A1A1 A1A1 A2A2 A2A2 A2A2 B1B1 B1B1 B1B1 B2B2 B2B2 B2B2 20 A1 A2 B1 B2 A1 B1 B2 A1 A2B1 A1B2A1B1A2B1 A1 B1 B 1 : R2 = Y B 2 : R1 = X A 1 : X = 1 A 2 : Y = 1

Fence Placement Processor B B 1 : R2 = Y fence(“load-load”) B 2 : R1 = X Processor A A 1 : X = 1 fence(“store-store”) A 2 : Y = 1 [A 1 < A 2 ]  [B 1 < B 2 ] 21

Fence Placement  Trivial in the previous example  Satisfying assignment to the avoid formula  Every satisfied constraint realized as a fence  Only had to choose fence type  More complicated in practice  Which satisfying assignment to chose? 22

Data Structures  Treiber’s Stack  Michael & Scott’s Non-Blocking Queue  Idempotent Work-Stealing Queue  Chase & Lev’s Work-Stealing Queue  Found a missing fence in an implementation used for an earlier paper.  … 23

Sample Results: Michael-Scott Queue  Used the results from [Alur et al. PLDI ’07] as a reference  Reference contains 7 fences  RMO*: 3 found  2 unneeded due to environment issues (memory management)  2 unneeded due to lack of speculation  PSO: 1 found, TSO: No fences required 24

Results 25

Summary  Fence inference  Finite-state programs  Safe and optimal  Work in progress  Scalability  Abstraction Over-approximation instead of bounding 26