Synthesis of Memory Fences via Refinement Propagation

Slides:



Advertisements
Similar presentations
Bounded Model Checking of Concurrent Data Types on Relaxed Memory Models: A Case Study Sebastian Burckhardt Rajeev Alur Milo M. K. Martin Department of.
Advertisements

Symmetric Multiprocessors: Synchronization and Sequential Consistency.
Greta YorshEran YahavMartin Vechev IBM Research. { ……………… …… …………………. ……………………. ………………………… } P1() Challenge: Correct and Efficient Synchronization { ……………………………
Greta YorshEran YahavMartin Vechev IBM Research. { ……………… …… …………………. ……………………. ………………………… } T1() Challenge: Correct and Efficient Synchronization { ……………………………
A Program Transformation For Faster Goal-Directed Search Akash Lal, Shaz Qadeer Microsoft Research.
Concurrency: Mutual Exclusion and Synchronization Chapter 5.
Architecture-aware Analysis of Concurrent Software Rajeev Alur University of Pennsylvania Amir Pnueli Memorial Symposium New York University, May 2010.
CS 162 Memory Consistency Models. Memory operations are reordered to improve performance Hardware (e.g., store buffer, reorder buffer) Compiler (e.g.,
“FENDER” AUTOMATIC MEMORY FENCE INFERENCE Presented by Michael Kuperstein, Technion Joint work with Martin Vechev and Eran Yahav, IBM Research 1.
Ch. 7 Process Synchronization (1/2) I Background F Producer - Consumer process :  Compiler, Assembler, Loader, · · · · · · F Bounded buffer.
1 Eran Yahav Technion Joint work with Martin Vechev (ETH), Greta Yorsh (ARM), Michael Kuperstein (Technion), Veselin Raychev (ETH)
Chapter 6: Process Synchronization
Background Concurrent access to shared data can lead to inconsistencies Maintaining data consistency among cooperating processes is critical What is wrong.
Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9 th Edition Chapter 5: Process Synchronization.
5.1 Silberschatz, Galvin and Gagne ©2009 Operating System Concepts with Java – 8 th Edition Chapter 5: CPU Scheduling.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 6: Process Synchronization.
Multiprocessor Synchronization Algorithms ( ) Lecturer: Danny Hendler The Mutual Exclusion problem.
Process Synchronization. Module 6: Process Synchronization Background The Critical-Section Problem Peterson’s Solution Synchronization Hardware Semaphores.
1 Operating Systems, 122 Practical Session 5, Synchronization 1.
CH7 discussion-review Mahmoud Alhabbash. Q1 What is a Race Condition? How could we prevent that? – Race condition is the situation where several processes.
PARTIAL-COHERENCE ABSTRACTIONS FOR RELAXED MEMORY MODELS Presented by Michael Kuperstein, Technion Joint work with Martin Vechev, IBM Research and Eran.
Concurrency.
Martin Vechev IBM Research Michael Kuperstein Technion Eran Yahav Technion (FMCAD’10, PLDI’11) 1.
Martin Vechev IBM Research Michael Kuperstein Technion Eran Yahav Technion (FMCAD’10, PLDI’11) 1.
Operating Systems CSE 411 CPU Management Oct Lecture 13 Instructor: Bhuvan Urgaonkar.
Memory Consistency Models Some material borrowed from Sarita Adve’s (UIUC) tutorial on memory consistency models.
Thread-modular Abstraction Refinement Thomas A. Henzinger, et al. CAV 2003 Seonggun Kim KAIST CS750b.
The Critical Section Problem
Shared Memory Consistency Models: A Tutorial Sarita V. Adve Kouroush Ghrachorloo Western Research Laboratory September 1995.
28/10/1999POS-A1 The Synchronization Problem Synchronization problems occur because –multiple processes or threads want to share data; –the executions.
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 7: Process Synchronization Background The Critical-Section Problem Synchronization.
Anshul Kumar, CSE IITD CSL718 : Multiprocessors Synchronization, Memory Consistency 17th April, 2006.
Anshul Kumar, CSE IITD ECE729 : Advance Computer Architecture Lecture 26: Synchronization, Memory Consistency 25 th March, 2010.
Memory Consistency Models. Outline Review of multi-threaded program execution on uniprocessor Need for memory consistency models Sequential consistency.
Pattern-based Synthesis of Synchronization for the C++ Memory Model Yuri Meshman, Noam Rinetzky, Eran Yahav 1.
CIS 842: Specification and Verification of Reactive Systems Lecture INTRO-Examples: Simple BIR-Lite Examples Copyright 2004, Matt Dwyer, John Hatcliff,
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 6: Process Synchronization.
/ PSWLAB Thread Modular Model Checking by Cormac Flanagan and Shaz Qadeer (published in Spin’03) Hong,Shin Thread Modular Model.
Specifying Multithreaded Java semantics for Program Verification Abhik Roychoudhury National University of Singapore (Joint work with Tulika Mitra)
Abstractions for Relaxed Memory Models Andrei Dan, Yuri Meshman, Martin Vechev, Eran Yahav 1.
Homework-6 Questions : 2,10,15,22.
Synchronization Questions answered in this lecture: Why is synchronization necessary? What are race conditions, critical sections, and atomic operations?
Chapter 6 Synchronization Dr. Yingwu Zhu. The Problem with Concurrent Execution Concurrent processes (& threads) often access shared data and resources.
Symmetric Multiprocessors: Synchronization and Sequential Consistency
Memory Consistency Models
Memory Consistency Models
Chapter 5: Process Synchronization
Specifying Multithreaded Java semantics for Program Verification
Abstraction-Guided Synthesis
Lecture 11: Mutual Exclusion
Modeling Mutual Exclusion Algorithms
Symmetric Multiprocessors: Synchronization and Sequential Consistency
Objective of This Course
Symmetric Multiprocessors: Synchronization and Sequential Consistency
Over-Approximating Boolean Programs with Unbounded Thread Creation
Topic 6 (Textbook - Chapter 5) Process Synchronization
The Critical-Section Problem
Module 7a: Classic Synchronization
Lecture 2 Part 2 Process Synchronization
Critical section problem
Grades.
Chapter 6: Process Synchronization
Memory Consistency Models
Lecture 11: Mutual Exclusion
CSE 153 Design of Operating Systems Winter 19
Why we have Counterintuitive Memory Models
Chapter 6: Synchronization Tools
Problems with Locks Andrew Whitaker CSE451.
Abstraction-Guided Synthesis of synchronization
Pointer analysis John Rollinson & Kaiyuan Li
Presentation transcript:

Synthesis of Memory Fences via Refinement Propagation We want to verify infinite state concurrent algorithms running on relaxed memory models. And for algorithms that loose correctness provide correction using fences. Yuri Meshman2, Andrei Dan Marian1, Martin Vechev1, Eran Yahav2 1ETH Zurich 2Technion

Dekker’s Algorithm sequential consistency Yes No relaxed model x86 TSO initial: Flag0= false, Flag1 = false, Turn = 0 Thread 0: Flag0 := true while Flag1 = true { if Turn ≠ 0 { Flag0 := false while Turn ≠ 0 { } } // critical section Turn := 1 Thread 1: Flag1 := true while Flag0 = true { if Turn ≠ 1 { Flag1 := false while Turn ≠ 1 { } } // critical section Turn := 0 This implementation of Dekker mutual exclusion algorithm is correct on SC. However, on RMM such as x86 TSO it is not correct. Why? When running on RMM, statements may not respect program ordering. TSO – Total Store Order spec: mutual exclusion over critical section sequential consistency Yes relaxed model x86 TSO No

Correct Dekker Algorithm initial: Flag0 = false, Flag1 = false, Turn = 0 Thread 0: Flag0 := true fence while Flag1 = true { if Turn ≠ 0 { Flag0 := false while Turn ≠ 0 { } } // critical section Turn := 1 Thread 1: Flag1 := true fence while Flag0 = true { if Turn ≠ 1 { Flag1 := false while Turn ≠ 1 { } } // critical section Turn := 0 This algorithm is correct, when adding the fence instructions which prevent reordering. spec: mutual exclusion over critical section relaxed model x86 TSO Yes Potential fence locations

Goal Verify correctness of algorithms Synthesize minimal fence placements Use existing tools - concurinterproc

Abstract Interpretation using Numerical Domains for Sequential Consistency initial: Flag0 = false, Flag1 = false, Turn = 0 Thread 0: 0:/*init*/ 1:flag0 := true 2: while Flag1 = true { 3: if Turn ≠ 0 { 4: Flag0 := false 5: while Turn ≠ 0 { } 6: Flag0 := true 7: } 8:} 9:// critical section A: Turn := 1 B: Flag0 := false Thread 1: 0:/*init*/ 1: Flag1 := true 2: while Flag0 = true { 3: if Turn ≠ 1 { 4: Flag1 := false 5: while Turn ≠ 1 { } 6: Flag1 := true 7: } 8: } 9: // critical section A: Turn := 0 B: Flag1 := false Note: Turn is top at (2,9) In this case, running AI with octagon gives us \bottom at (9,9) meaning that the mutex cannot be violated. However, as I told you before, this is not correct under TSO and under TSO this result is therefore not sounds. And this is because we did not take the effects of the RMM into account…. The analysis assumes SC semantics. (0,0) { Turn = 0; Flag1 = 0; Flag0 = 0 } (9,9) { bottom } (2,2) { Flag1 – 1 = 0; Flag0 – 1 = 0; -Turn + Flag1 >= 0; Turn >= 0 } (2,9) { Flag1 - 1 = 0; Flag0 - 1 = 0; } //line number indicate state at the end of the line (i.e. after executing)

Buffered Relaxed Memory Model Flush (by memory subsystem) store fence (Turn, 1) Main Memory Thread 0 ... In contrast to SC, where writes go directly to main memory. In relaxed models such as TSO And PSO, writes go into a store buffer before being written back to main memory. (Turn, 0) Thread 1 ... load

Our Approach: Overview Program P PAFI Specification S Program P’ with Fences PAFI Propagation of Abstraction for Fence Inference Memory Model M P’ satisfies the specification S under M PAFI – Propagation of Abstraction for Fence Inference

Fence Inference with Abstraction Refinement Program P Refinement Propagation Fence Inference Specification S PM fixedpoint P’ translate SC analyzer Notes: Should say here --- could have many solutions, implement is picking one based on quantitative criterion Constraint captures changes to the program execution --- what is permitted during program execution Abstraction Refinement Memory Model M Challenge: when should we add fences, when should we refine abstraction?

Translation: Construction of PTSO Translation items for process t0: buffer size limit: k buffer location i : varit0, valit0 buffer dynamic size: cnt_t0 (var1t0, val1t0) (var2t0, val2t0) (var3t0, val3t0) cnt_t0 load a = Y if cnt_t0 > k - 1 and varkt0 == ‘Y’ a = valkt0; : elseif cnt_t0 > 0 and var1t0 == ‘Y’ a = val1t0; else a = Y; In this case M=TSO

Translation: Construction of PTSO Translation items for process t0: buffer size limit: k buffer location i : varit0, valit0 buffer dynamic size: cnt_t0 (var1t0, val1t0) (var2t0, val2t0) (var3t0, val3t0) cnt_t0 store Y = a if cnt_t0 >= k halt; //overflow cnt_t0 = cnt_t0 + 1; if cnt_t0 < 2 var1t0 = ‘Y’; val1t0 = a; : elseif cnt_t0 < k + 1 varkt0 = ‘Y’; valkt0 = a;

Translation Challenge - Flush while nondet yield; if cnt_t0 > 0 if var1t0 == ‘X’ X = val1t0; elseif var1t0 == ‘Y’ Y = val1t0; if cnt_t0 > 1 var1t0 = var2t0; val1t0 = val2t0; if cnt_t0 > 2 var2t0 = var3t0; val2t0 = val3t0; cnt_t0 = cnt_t0 - 1; /* due to the non deterministic loop at this point we can’t know if flush occurred or not.*/ Main Memory var3t0 var2t0 var1t0 Y X 3 1 val1t0 cnt_t0 X

Why Flush Encoding is Challenging? The non deterministic flush captures two buffer states: A value is flushed -- buffer content moves one place (shifting) The value of X is not flushed We need a disjunction to represent both. For convex numerical domains supporting disjunction is expensive. flushed: non flushed: 3 3 3 1 cnt_t0=1 cnt_t0=2 join 3 [1,3] cnt_t0=[1,2]

Abstraction Refinement We will use Refinement Placement -- utilize logico-numerical domains: Auxilary boolean flags Similar to trace partitioning. Supported by our SC verifier – ConcurInterproc flushed: non flushed: 3 3 3 1 cnt_t0=1 cnt_t0=2  3 1 𝑏 , cnt_t0=2 3 3 ¬𝑏 cnt_t0=1

Partial refinement placement Each boolean variable splits the state – verification complexity growth exponential (in flush complexity). Partial refinement(booleans) placement may safice – We want a minimal placement that will enable verification. 𝑟= 𝑏 𝑖 1 , 𝑏 𝑖 2 ,…

Two-dimensional exponential search We would like a minimal refinement We would like a minimal fence placement Search a two dimensional domain. Search space exponential in number of fences and in number of refinement placements We will learn from verified and failed attempts

Example Flag0 := true fence while Flag1 = true { if Turn ≠ 0 { Flag0 := false while Turn ≠ 0 { } } // critical section Turn := 1 5 potential fence locations 8 potential refinement locations

Example – Placement implication Removed fence Fence Refinement

Example  ? 

Abstraction Propagation f3,r3 f3,r3 f1,r1 f2,r2 f1,r1 f2,r2 propagation of: program correctness + abstraction refinements Legend: means the program is about to be explored What we have is a schema – denotes a bunch of algorithms. Now we want to explore the schema by applying transformations. Similarly means the program was not yet explored means the program has been explored means the program need not be explored further means is a successful abstraction refinement used to verify program is a suggestion to prove program with a combined abstraction refinement

Prototype Implementation Implementation in Python ConcurInterproc Z3 checking final state satisfies specification Experiments: Intel(R) Xeon(R) 2.13 GHz server with 250 GB RAM

Benchmarks 15 concurrent algorithms 8 infinite state Mutual exclusion algorithms Unbounded array-based concurrent work-stealing queue: Chase-Lev WSQ 8 infinite state Safety specifications mutual exclusion and reachability invariants Queue coherence invariants Exit: compared to simple breadth first and depth first search

Example of Results : PC1 9 possible fences 27 possible locations for refinement placement BFS explore various boolean placements for full fenced placement for 3:30 hours DFS verifies 5 fence placements in under 5 mins state explosion leads to exploring failing placements for the rest of the time Propagation finds that a single fence is needed

Example of Results : PC1

Summary Verification and Fence Synthesis via refinement propagation Using translation to encode Relaxed Memory semantics into code. Abstraction refinement using booleans Two dimensional search Learning from failed and successful verifications to guide the search. Thank you: Questions? Partly supported by SNF grant on practical synthesis for concurrent systems and ISF grant # 965/10

TSO algorithm 1h 2h 4h (f, r) f r abp prop (2,17) bfs dfs concloop 2 4   f r abp prop (2,17) bfs dfs concloop 2 4 (4,14) 8 kessel 5 3 (6,12) 1 6 loop2-TLM 14 (6,21) 7 10 pc1 (9,27) 9 peterson (6,23) pgsql (8,23)

PSO algorithm 1h 2h 4h (f, r) f r abp prop (2,17) bfs dfs concloop 2 4 (2,17) bfs dfs   concloop 2 4 (4,14) 8 dekker our 10 9 (10,34) 7 6 kessel 3 (6,12) 5 1 loop2-TLM (6,21) pc1 (9,27) pgsql (8,23)

Numerical analysis is not monotonic Our assumption: smaller refinement placement is easier to verify. Nature of abstraction: the assumption could be false.

Related work Memorax (TACAS12-13, SAS12) GOOD - online predicate abstraction Abdulla LESS GOOD - only correctness of programs locked writes - for TSO only

Related work Muskteer - Don’t sit on the fence (CAV14) - Scalability <-> precision - Large fence placements for small programs

Related work Software verification for weak memory via program transformation (ESOP13) Check mostly for litmus tests, also dekker, bakery, szymanski, peterson Do not infer fences but assume minimal fences exist.

Related work SAS13