Goldilocks: Efficiently Computing the Happens-Before Relation Using Locksets Tayfun Elmas 1, Shaz Qadeer 2, Serdar Tasiran 1 1 Koç University, İstanbul,

Slides:

Advertisements

Similar presentations

Dataflow Analysis for Datarace-Free Programs (ESOP 11) Arnab De Joint work with Deepak DSouza and Rupesh Nasre Indian Institute of Science, Bangalore.

Advertisements

Concurrent programming for dummies (and smart people too) Tim Harris & Keir Fraser.

Optimistic Methods for Concurrency Control By : H.T. Kung & John T. Robinson Presenters: Munawer Saeed.

1 Chao Wang, Yu Yang*, Aarti Gupta, and Ganesh Gopalakrishnan* NEC Laboratories America, Princeton, NJ * University of Utah, Salt Lake City, UT Dynamic.

Concurrency: Deadlock and Starvation Chapter 6. Deadlock Permanent blocking of a set of processes that either compete for system resources or communicate.

Java PathRelaxer: Extending JPF for JMM-Aware Model Checking Huafeng Jin, Tuba Yavuz-Kahveci, and Beverly Sanders Computer and Information Science and.

Reduction, abstraction, and atomicity: How much can we prove about concurrent programs using them? Serdar Tasiran Koç University Istanbul, Turkey Tayfun.

Race Detection for Android Applications

A Randomized Dynamic Program Analysis for Detecting Real Deadlocks Koushik Sen CS 265.

D u k e S y s t e m s Time, clocks, and consistency and the JMM Jeff Chase Duke University.

Chapter 6 Process Synchronization Bernard Chen Spring 2007.

Chapter 6: Process Synchronization

Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 6: Process Synchronization.

Process Synchronization. Module 6: Process Synchronization Background The Critical-Section Problem Peterson’s Solution Synchronization Hardware Semaphores.

Ensuring Operating System Kernel Integrity with OSck By Owen S. Hofmann Alan M. Dunn Sangman Kim Indrajit Roy Emmett Witchel Kent State University College.

Eraser: A Dynamic Data Race Detector for Multithreaded Programs STEFAN SAVAGE, MICHAEL BURROWS, GREG NELSON, PATRICK SOBALVARRO and THOMAS ANDERSON.

Dynamic Data Race Detection. Sources Eraser: A Dynamic Data Race Detector for Multithreaded Programs –Stefan Savage, Michael Burrows, Greg Nelson, Patric.

SOS: Saving Time in Dynamic Race Detection with Stationary Analysis Du Li, Witawas Srisa-an, Matthew B. Dwyer.

Atomicity in Multi-Threaded Programs Prachi Tiwari University of California, Santa Cruz CMPS 203 Programming Languages, Fall 2004.

Cormac Flanagan and Stephen Freund PLDI 2009 Slides by Michelle Goodstein 07/26/10.

PARALLEL PROGRAMMING with TRANSACTIONAL MEMORY Pratibha Kona.

“THREADS CANNOT BE IMPLEMENTED AS A LIBRARY” HANS-J. BOEHM, HP LABS Presented by Seema Saijpaul CS-510.

TaintCheck and LockSet LBA Reading Group Presentation by Shimin Chen.

CS533 Concepts of Operating Systems Class 3 Data Races and the Case Against Threads.

© Andy Wellings, 2004 Roadmap  Introduction  Concurrent Programming  Communication and Synchronization  Completing the Java Model  Overview of the.

Efficient dynamic race detection int x; void * t1(){ x = 2; } void * t2(){ x = 3; } main(){ pthread_create( t1 ); pthread_create( t2 ); printf( “x is %d\n”,

Programming Language Semantics Java Threads and Locks Informal Introduction The Java Specification Language Chapter 17.

Language Support for Lightweight transactions Tim Harris & Keir Fraser Presented by Narayanan Sundaram 04/28/2008.

CS533 Concepts of Operating Systems Class 3 Monitors.

Instructor: Umar KalimNUST Institute of Information Technology Operating Systems Process Synchronization.

Cormac Flanagan UC Santa Cruz Velodrome: A Sound and Complete Dynamic Atomicity Checker for Multithreaded Programs Jaeheon Yi UC Santa Cruz Stephen Freund.

A Parallel, Real-Time Garbage Collector Author: Perry Cheng, Guy E. Blelloch Presenter: Jun Tao.

/ PSWLAB Eraser: A Dynamic Data Race Detector for Multithreaded Programs By Stefan Savage et al 5 th Mar 2008 presented by Hong,Shin Eraser:

DETECTION OF POTENTIAL DEADLOCKS AND DATARACES ROZA GHAMARI Bogazici UniversityMarch 2009.

Runtime Refinement Checking of Concurrent Data Structures (the VYRD project) Serdar Tasiran Koç University, Istanbul, Turkey Shaz Qadeer Microsoft Research,

Accelerating Precise Race Detection Using Commercially-Available Hardware Transactional Memory Support Serdar Tasiran Koc University, Istanbul, Turkey.

Eraser: A Dynamic Data Race Detector for Multithreaded Programs STEFAN SAVAGE, MICHAEL BURROWS, GREG NELSON, PATRICK SOBALVARRO, and THOMAS ANDERSON Ethan.

Introduction to Object Oriented Programming CMSC 331.

Dynamic Data Race Detection. Sources Eraser: A Dynamic Data Race Detector for Multithreaded Programs –Stefan Savage, Michael Burrows, Greg Nelson, Patric.

Consider the program fragment below left. Assume that the program containing this fragment executes t1() and t2() on separate threads running on separate.

1 Interprocess Communication (IPC) - Outline Problem: Race condition Solution: Mutual exclusion –Disabling interrupts; –Lock variables; –Strict alternation.

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Concurrency & Dynamic Programming.

Thread basics. A computer process Every time a program is executed a process is created It is managed via a data structure that keeps all things memory.

U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science Computer Systems Principles Synchronization Emery Berger and Mark Corner University.

CSE 153 Design of Operating Systems Winter 2015 Midterm Review.

HARD: Hardware-Assisted lockset- based Race Detection P.Zhou, R.Teodorescu, Y.Zhou. HPCA’07 Shimin Chen LBA Reading Group Presentation.

/ PSWLAB Thread Modular Model Checking by Cormac Flanagan and Shaz Qadeer (published in Spin’03) Hong,Shin Thread Modular Model.

Aritra Sengupta, Man Cao, Michael D. Bond and Milind Kulkarni PPPJ 2015, Melbourne, Florida, USA Toward Efficient Strong Memory Model Support for the Java.

A Calculus of Atomic Actions Tayfun Elmas, Shaz Qadeer and Serdar Tasiran POPL ‘ – Seminar in Distributed Algorithms Cynthia Disenfeld 27/05/2013.

Eraser: A dynamic Data Race Detector for Multithreaded Programs Stefan Savage, Michael Burrows, Greg Nelson, Patrick Sobalvarro, Thomas Anderson Presenter:

Reachability Testing of Concurrent Programs1 Reachability Testing of Concurrent Programs Richard Carver, GMU Yu Lei, UTA.

Simplifying Linearizability Proofs Using Reduction and Abstraction Serdar Tasiran Koc University, Istanbul, Turkey Tayfun Elmas, Ali Sezgin, Omer Subasi.

Clock Snooping and its Application in On-the-fly Data Race Detection Koen De Bosschere and Michiel Ronsse University of Ghent, Belgium Taipei, TaiwanDec.

Using Escape Analysis in Dynamic Data Race Detection Emma Harrington `15 Williams College

A Calculus of Atomic Actions Serdar Tasiran Koc University, Istanbul, Turkey Tayfun ElmasShaz Qadeer Koc University Microsoft Research.

FastTrack: Efficient and Precise Dynamic Race Detection [FlFr09] Cormac Flanagan and Stephen N. Freund GNU OS Lab. 23-Jun-16 Ok-kyoon Ha.

Detecting Data Races in Multi-Threaded Programs

Presenter: Godmar Back

Minh, Trautmann, Chung, McDonald, Bronson, Casper, Kozyrakis, Olukotun

CS533 Concepts of Operating Systems Class 3

Amir Kamil and Katherine Yelick

Threads and Memory Models Hal Perkins Autumn 2011

Threads and Memory Models Hal Perkins Autumn 2009

CS533 Concepts of Operating Systems Class 3

Amir Kamil and Katherine Yelick

Chapter 6: Synchronization Tools

Dynamic Race Prediction in Linear Time

CSE 153 Design of Operating Systems Winter 2019

Eraser: A dynamic data race detector for multithreaded programs

Presentation transcript:

Goldilocks: Efficiently Computing the Happens-Before Relation Using Locksets Tayfun Elmas 1, Shaz Qadeer 2, Serdar Tasiran 1 1 Koç University, İstanbul, Turkey 2 Microsoft Research, Redmond, WA FATES/RV’06 August 15-16, Seattle, WA

2 Our goal Continuous runtime monitoring of concurrent Java programs –Target: Race conditions –Criteria Efficiency: Tolerable impact on performance Precision: Prevent false alarms The Java Memory Model (JMM) [Manson et.al, POPL’05] –“Two accesses form a data race in an execution of a program if they conflict, they are from different threads and they are not ordered by happens-before (H-B).” Exact H-B computation  precise race detection

3 Existing dynamic approaches Vector-clock algorithms [Mattern, 1989] –Vector clock: For each thread and variable, a vector of logical clocks Vector has size T = #threads –Vector updated at each synchronization operation Precise but inefficient in some cases –O(T) computation at each synchronization operation –Other algorithms use cheaper checks for well-protected variables Thread-local variables, variables protected by single locks Lockset algorithms [Savage et.al., 1997] –Lockset: A set of locks protecting access to variable d –Lockset update rules specific to a synchronization discipline Efficient, intuitive, but imprecise –False alarms: Synchronization discipline violated but no race occurred –Additional mechanisms to reduce false alarms State machines for object initialization, escape, thread-locality

4 Our work The Goldilocks algorithm –Novel lockset-based method that precisely computes H-B As efficient as other lockset algorithms As precise as vector-clocks Uniformly captures all synchronization disciplines Our locksets contain locks, volatile variables, thread ids Theorem: When thread t accesses variable d, there is no race iff Lockset of d at that point contains t Sound: Detects all apparent races that occur in execution Precise: Race reported  Two accesses not ordered by H-B No false alarms No alarms about potential races in similar executions

5 Outline The Goldilocks algorithm Implementation Evaluation Conclusions

6 Example a := IntBox() b := IntBox() acquire(L1) acquire(L2) a.x ++ release(L1) tmp:= a a := b b := tmp class IntBox { int x; } release(L1) release(L2) acquire(L2) b.x ++ release(L2) T1 T2 T3 Global Variables a, b: IntBox o1.x, o2.x: int o1 a o2 b L1 L2 o2 a o1 b L1 L2

7 Eraser a := IntBox() b := IntBox() acquire(L1) acquire(L2) a.x ++ release(L1) tmp:= a a := b b := tmp release(L1) release(L2) acquire(L2) b.x ++ release(L2) T1 T2 T3 LS(o1.x) = {all locks} No access to o1.x, LS(o1.x) not modified LS(o1.x) = {all locks}  {L1} = {L1} check LS(o1.x)  LH(T1) =  LS(o1.x) = {L1}  {L3} =  check LS(o1.x)  LH(T3) =  Race reported!

8 The happens-before relation pp pp pp  sw Happens-before in JMM:  hb Transitive closure of Program orders of threads:  p Synchronizes-with:  sw release(l)  sw acquire(l) vol-write(v)  sw vol-read(v) fork(t)  hb (action of t) (action of t)  hb join(t)  hb a.x ++ b.x ++ a := IntBox() b := IntBox() acquire(L1) acquire(L2) release(L1) tmp:= a a := b b := tmp release(L1) release(L2) acquire(L2) release(L2) T1 T2 T3

9 Goldilocks intuition LS: (Variables)   (Threads  Locks  Volatiles) Update rules maintain invariants: 1.Thread t  LS(d)   t is owner of d Accesses to d by t are race-free 2.Lock l  LS(d)  acquire l to become owner of d 3.Volatile v  LS(d)  read v to become owner of d When t accesses d: Race-free iff (t  LS(d)) After t accesses d: LS(d) = { t } –t is the only owner of d –Other threads: Must synchronize with t In order to become an owner of d

10 Lockset update rules Ownership transfer between threads –LS(d) grows through synchronization actions release(l) by t For each variable d: if (t  LS(d))  (add l to LS(d)) acquire(l) by t For each variable d: if (l  LS(d))  (add t to LS(d)) volatile-write(v) by t For each variable d: if (t  LS(d))  (add v to LS(d)) volatile-read(v) by t For each variable d: if (v  LS(d))  (add t to LS(d)) fork(s) by t For each variable d: if (t  LS(d))  (add s to LS(d)) join(s) by t For each variable d: if (s  LS(d))  (add t to LS(d))

11 Goldilocks LS(o1.x) =  LS(o1.x) = {T1} First access LS(o1.x) = {T1, L1} (T1  LS)  (add L1 to LS) LS(o1.x) = {T1, L1, T2} (L1  LS)  (add T2 to LS) LS(o1.x) = {T1, L1, T2, L2} (T2  LS)  (add L2 to LS) LS(o1.x) = {T1, L1, T2, L2, T3} (L2  LS)  (add T3 to LS) LS(o1.x) = {T3} (T3  LS)  (No race) LS(o1.x) = {T3, L2} (T3  LS)  (add L2 to LS) a := IntBox() b := IntBox() acquire(L1) acquire(L2) a.x ++ release(L1) tmp:= a a := b b := tmp release(L1) release(L2) acquire(L2) b.x ++ release(L2) T1 T2 T3 LS(o1.x) = {T1, L1, T2} (L2  LS)  (add T2 to LS) LS(o1.x) = {T1, L1, T2} (T2  LS)  (add L1 to LS)

12 Uniform handling of many scenarios Dynamically changing locksets Permanent/temporary thread-locality Container-protected objects –Lockset of contained variable changes although variable is not touched Synchronization using wait/notify(All) –No additional lockset update rules Synchronization using volatile variables –Conditional branches on volatile variables Classes in java.util.concurrent package –Semaphores, barriers,...

13 Outline The Goldilocks algorithm Implementation Evaluation Conclusions

14 Implementation Naive implementation too inefficient acquire(l) by thread t For each variable d: if (l  LS(d))  (add t to LS(d)) Implementation features Short-circuit checks before lockset computation –Handle thread-locality, unique protecting lock,... Lazy evaluation of locksets –Apply update rules at only variable access –Keep synchronization actions in a global event list Order of events consistent with  p and  sw Implicit, shared representation of locksets –Use temporary locksets only at access Global event list T2, vol-write, v T1, release, l T1, vol-read, v T2, acquire, l T1, acquire, l T2, release, l x y

15 Implementation in Kaffe In the Kaffe Virtual Machine [ –Clean room implementation of JVM in C –Full Java platform functionality Instrumented byte-code interpreter –Functions executing instructions for synchronization, heap access Per thread checking –Each thread checks its own actions –Communication via global event list –Applicable to multiprocessors Handle-Action (Thread t, Action  ) IF  is a synchronization action Add  to the global event list ELSE IF  is an access to variable d IF all short-circuit checks fail Apply-Lockset-Rules(t, d) Global event list T2, vol-write, v T1, release, l T1, vol-read, v T2, acquire, l T1, acquire, l T2, release, l

16 Short-circuit checks Sufficient, constant time checks for H-B –If any of them succeed: No race  No need for lockset computation Track owner thread –For each variable d, keep the last accessor thread owner-thread(d): Current accessor thread –Succeeds when d remains thread-local Track single unique lock –For each variable d, guess a unique protecting lock single-lock(d): Random lock held by current accessor thread –Succeeds as long as d is accessed while holding same lock

17 Lazy evaluation of locksets o1.x T1, alloc, o2 T1, alloc, o1 T1, acquire, L1 a := IntBox() b := IntBox() acquire(L1) a.x ++ T1 a := IntBox() b := IntBox() acquire(L1) a.x ++ T1 acquire(L1) acquire(L2) release(L1) tmp:= a a := b b := tmp release(L1) release(L2) acquire(L2) b.x ++ T2 T3 T1, alloc, o2 T1, alloc, o1 T1, acquire, L1 T2, acquire, L1 T2, acquire, L2 T1, release, L1 T2, release, L1 T2, release, L2 T3, acquire, L2 Initialize LS(o1.x) = { T1 } Repeat Apply lockset rules on LS(o1.x) Until last synchronization action by T3 Check whether T3  LS(o1.x) T1, alloc, o2 T1, acquire, L1 T2, acquire, L1 T2, acquire, L2 T1, release, L1 T2, release, L1 T2, release, L2 T1, alloc, o1 T3, release, L2 T3, acquire, L2 Garbage collect unreferenced events a := IntBox() b := IntBox() acquire(L1) a.x ++ T1 acquire(L1) acquire(L2) release(L1) tmp:= a a := b b := tmp release(L1) release(L2) acquire(L2) b.x ++ T2 T3 release(l)

18 Outline The Goldilocks algorithm Implementation Evaluation Conclusions

19 Evaluation Algorithms evaluated –Goldilocks –Eraser with state machines –Vector-clocks Benchmarks Microbenchmarks: Interesting, artificial programs –Multiset: Well-protected insertions, deletions, lookups of integers –SharedSpot: Contains variables each protected by a unique lock –LocalSpot: Contains thread-local variables Larger programs for performance comparison –Raja, SciMark, Grande

20 Microbenchmarks Interesting cases: Thread-locality, variables protected by single unique locks Short-circuit checks work Per-access cost increases very slowly with # of threads

21 Large benchmarks Goldilocks much faster than vector clocks Performance comparable to Eraser Precision comes at little or no extra cost

22 Conclusions The Goldilocks algorithm: A precise lockset-based characterization of the happens-before relation –Sound: Detects all apparent races –Precise: No false alarms –Efficient: Short-circuit checks + Lazy evaluation Handles all synchronization disciplines uniformly –Thread-locality, dynamically changing locksets, volatile variable-based synchronization,... Applicable to both model checking & runtime monitoring Future work –Dynamic & static methods based on Goldilocks –Tolerable cost for continuous runtime monitoring Tight integration of static methods and Goldilocks