LineUp: Automatic Thread Safety Checking

Slides:



Advertisements
Similar presentations
Dataflow Analysis for Datarace-Free Programs (ESOP 11) Arnab De Joint work with Deepak DSouza and Rupesh Nasre Indian Institute of Science, Bangalore.
Advertisements

Bounded Model Checking of Concurrent Data Types on Relaxed Memory Models: A Case Study Sebastian Burckhardt Rajeev Alur Milo M. K. Martin Department of.
Automated Theorem Proving Lecture 1. Program verification is undecidable! Given program P and specification S, does P satisfy S?
An Abstract Interpretation Framework for Refactoring P. Cousot, NYU, ENS, CNRS, INRIA R. Cousot, ENS, CNRS, INRIA F. Logozzo, M. Barnett, Microsoft Research.
Hongjin Liang and Xinyu Feng
The complexity of predicting atomicity violations Azadeh Farzan Univ of Toronto P. Madhusudan Univ of Illinois at Urbana Champaign.
1 Chao Wang, Yu Yang*, Aarti Gupta, and Ganesh Gopalakrishnan* NEC Laboratories America, Princeton, NJ * University of Utah, Salt Lake City, UT Dynamic.
1 Model checking. 2 And now... the system How do we model a reactive system with an automaton ? It is convenient to model systems with Transition systems.
Architecture-aware Analysis of Concurrent Software Rajeev Alur University of Pennsylvania Amir Pnueli Memorial Symposium New York University, May 2010.
Abstraction and Modular Reasoning for the Verification of Software Corina Pasareanu NASA Ames Research Center.
1 1 Regression Verification for Multi-Threaded Programs Sagar Chaki, SEI-Pittsburgh Arie Gurfinkel, SEI-Pittsburgh Ofer Strichman, Technion-Haifa Originally.
Concurrency Important and difficult (Ada slides copied from Ed Schonberg)
A Randomized Dynamic Program Analysis for Detecting Real Deadlocks Koushik Sen CS 265.
Annoucements  Next labs 9 and 10 are paired for everyone. So don’t miss the lab.  There is a review session for the quiz on Monday, November 4, at 8:00.
Iterative Context Bounding for Systematic Testing of Multithreaded Programs Madan Musuvathi Shaz Qadeer Microsoft Research.
Tom Ball, Sebastian Burckhardt, Madan Musuvathi, Shaz Qadeer Microsoft Research.
On-the-Fly Garbage Collection: An Exercise in Cooperation Edsget W. Dijkstra, Leslie Lamport, A.J. Martin and E.F.M. Steffens Communications of the ACM,
1 Concurrency Specification. 2 Outline 4 Issues in concurrent systems 4 Programming language support for concurrency 4 Concurrency analysis - A specification.
Atomicity in Multi-Threaded Programs Prachi Tiwari University of California, Santa Cruz CMPS 203 Programming Languages, Fall 2004.
/ PSWLAB Atomizer: A Dynamic Atomicity Checker For Multithreaded Programs By Cormac Flanagan, Stephen N. Freund 24 th April, 2008 Hong,Shin.
Progress Guarantee for Parallel Programs via Bounded Lock-Freedom Erez Petrank – Technion Madanlal Musuvathi- Microsoft Bjarne Steensgaard - Microsoft.
An Introduction to Input/Output Automata Qihua Wang.
CS510 Advanced OS Seminar Class 10 A Methodology for Implementing Highly Concurrent Data Objects by Maurice Herlihy.
What Went Wrong? Alex Groce Carnegie Mellon University Willem Visser NASA Ames Research Center.
Concurrency Testing Challenges, Algorithms, and Tools Madan Musuvathi Microsoft Research.
Comparison Under Abstraction for Verifying Linearizability Daphna Amit Noam Rinetzky Mooly Sagiv Tom RepsEran Yahav Tel Aviv UniversityUniversity of Wisconsin.
Cormac Flanagan UC Santa Cruz Velodrome: A Sound and Complete Dynamic Atomicity Checker for Multithreaded Programs Jaeheon Yi UC Santa Cruz Stephen Freund.
Principle of Functional Verification Chapter 1~3 Presenter : Fu-Ching Yang.
/ PSWLAB Eraser: A Dynamic Data Race Detector for Multithreaded Programs By Stefan Savage et al 5 th Mar 2008 presented by Hong,Shin Eraser:
Learning From Mistakes—A Comprehensive Study on Real World Concurrency Bug Characteristics Shan Lu, Soyeon Park, Eunsoo Seo and Yuanyuan Zhou Appeared.
Exceptions and Mistakes CSE788 John Eisenlohr. Big Question How can we improve the quality of concurrent software?
Thread-modular Abstraction Refinement Thomas A. Henzinger, et al. CAV 2003 Seonggun Kim KAIST CS750b.
Software Testing Sudipto Ghosh CS 406 Fall 99 November 9, 1999.
CS527: (Advanced) Topics in Software Engineering Overview of Software Quality Assurance Tao Xie ©D. Marinov, T. Xie.
Software Engineering Prof. Dr. Bertrand Meyer March 2007 – June 2007 Chair of Software Engineering Static program checking and verification Slides: Based.
Chapter 8 – Software Testing Lecture 1 1Chapter 8 Software testing The bearing of a child takes nine months, no matter how many women are assigned. Many.
Runtime Refinement Checking of Concurrent Data Structures (the VYRD project) Serdar Tasiran Koç University, Istanbul, Turkey Shaz Qadeer Microsoft Research,
CS4231 Parallel and Distributed Algorithms AY 2006/2007 Semester 2 Lecture 3 (26/01/2006) Instructor: Haifeng YU.
Java Threads 11 Threading and Concurrent Programming in Java Introduction and Definitions D.W. Denbo Introduction and Definitions D.W. Denbo.
Optimistic Design 1. Guarded Methods Do something based on the fact that one or more objects have particular states  Make a set of purchases assuming.
Operating Systems ECE344 Ashvin Goel ECE University of Toronto Mutual Exclusion.
CORRECTNESS CRITERIA FOR CONCURRENCY & PARALLELISM 6/16/2010 Correctness Criteria for Parallelism & Concurrency 1.
11/18/20151 Operating Systems Design (CS 423) Elsa L Gunter 2112 SC, UIUC Based on slides by Roy Campbell, Sam.
ICS 313: Programming Language Theory Chapter 13: Concurrency.
Consider the program fragment below left. Assume that the program containing this fragment executes t1() and t2() on separate threads running on separate.
Detecting and Eliminating Potential Violation of Sequential Consistency for concurrent C/C++ program Duan Yuelu, Feng Xiaobing, Pen-chung Yew.
CIS 842: Specification and Verification of Reactive Systems Lecture INTRO-Examples: Simple BIR-Lite Examples Copyright 2004, Matt Dwyer, John Hatcliff,
Motivation  Parallel programming is difficult  Culprit: Non-determinism Interleaving of parallel threads But required to harness parallelism  Sequential.
CS510 Concurrent Systems Jonathan Walpole. A Methodology for Implementing Highly Concurrent Data Objects.
CS527 Topics in Software Engineering (Software Testing and Analysis) Darko Marinov August 30, 2011.
ICFEM 2002, Shanghai Reasoning about Hardware and Software Memory Models Abhik Roychoudhury School of Computing National University of Singapore.
CSCI1600: Embedded and Real Time Software Lecture 28: Verification I Steven Reiss, Fall 2015.
13-1 Chapter 13 Concurrency Topics Introduction Introduction to Subprogram-Level Concurrency Semaphores Monitors Message Passing Java Threads C# Threads.
Detecting Atomicity Violations via Access Interleaving Invariants
Alpine Verification Meeting 2008 Model Checking Transactional Memories Vasu Singh (Joint work with Rachid Guerraoui, Tom Henzinger, Barbara Jobstmann)
Optimistic Design CDP 1. Guarded Methods Do something based on the fact that one or more objects have particular states Make a set of purchases assuming.
/ PSWLAB Thread Modular Model Checking by Cormac Flanagan and Shaz Qadeer (published in Spin’03) Hong,Shin Thread Modular Model.
Specifying Multithreaded Java semantics for Program Verification Abhik Roychoudhury National University of Singapore (Joint work with Tulika Mitra)
CHESS Finding and Reproducing Heisenbugs in Concurrent Programs
Random Test Generation of Unit Tests: Randoop Experience
Week 6 MondayTuesdayWednesdayThursdayFriday Testing III Reading due Group meetings Testing IVSection ZFR due ZFR demos Progress report due Readings out.
Testing Overview Software Reliability Techniques Testing Concepts CEN 4010 Class 24 – 11/17.
Testing Concurrent Programs Sri Teja Basava Arpit Sud CSCI 5535: Fundamentals of Programming Languages University of Colorado at Boulder Spring 2010.
Lecture 5 Page 1 CS 111 Summer 2013 Bounded Buffers A higher level abstraction than shared domains or simple messages But not quite as high level as RPC.
Certifying and Synthesizing Membership Equational Proofs Patrick Lincoln (SRI) joint work with Steven Eker (SRI), Jose Meseguer (Urbana) and Grigore Rosu.
Types for Programs and Proofs
Background on the need for Synchronization
Input Space Partition Testing CS 4501 / 6501 Software Testing
Some Simple Definitions for Testing
Test Case Test case Describes an input Description and an expected output Description. Test case ID Section 1: Before execution Section 2: After execution.
Presentation transcript:

LineUp: Automatic Thread Safety Checking Madan Musuvathi Joint work with Sebastian Burckhardt, Chris Dern, Roy Tan

Motivation Concurrent applications are common place Correct synchronization is tricky Modular programming with “thread-safe” components Hide synchronization details within each component Expose a simpler sequential interface Can be called concurrently without additional synchronization

Example of a Thread-Safe Component Concurrent Queue push(…) pop(…) State = List of elements … Component = state + set of operations Clients can concurrently call these operations without additional synchronization Concurrent Queue still behaves like a “queue”

In this talk A precise formalization of thread safety Deterministic Linearizability An automatic method for checking thread safety

q = new ConcurrentQueue(); Let’s Write a Test q = new ConcurrentQueue(); q.push(10); t = q.pop(); Assert( ? )

q = new ConcurrentQueue(); Let’s Write a Test q = new ConcurrentQueue(); q.push(10); t = q.pop(); Assert: q.size() is 0 or 1

q = new ConcurrentQueue(); and t is 10 or <fail> Let’s Write a Test q = new ConcurrentQueue(); q.push(10); t = q.pop(); Assert: q.size() is 0 or 1 and t is 10 or <fail>

Let’s Write a Test q = new ConcurrentQueue(); q.push(10); t = q.pop(); Assert: t = fail && q.size() = 1 && q.peek() == 10 || t = 10 && q.size() = 0

q = new ConcurrentQueue(); Let’s Write a Test q = new ConcurrentQueue(); q.push(10); t = q.pop(); q.push(20); u = q.pop(); Assert ( ? )

q = new ConcurrentQueue(); Let’s Write a Test q = new ConcurrentQueue(); q.push(10); t = q.pop(); q.push(20); u = q.pop(); Assert: q.size() == 0 && t = 10 || t = 20 && u = 10 || t = 20 && u != t

q = new ConcurrentQueue(); Let’s Write a Test q = new ConcurrentQueue(); q.push(10); t1 = q.pop(); t2 = q.peek(); q.push(20); q.push(30); u1 = q.peek(); q.push(40); u2 = q.pop(); v1 = q.pop(); q.push(50); v2 = q.peek(); q.push(60); … Assert ( ? )

q = new ConcurrentQueue(); ConcurrentQueue behaves We want to simply say… q = new ConcurrentQueue(); q.push(10); t1 = q.pop(); t2 = q.peek(); q.push(20); q.push(30); u1 = q.peek(); q.push(40); u2 = q.pop(); v1 = q.pop(); q.push(50); v2 = q.peek(); q.push(60); … Assert: ConcurrentQueue behaves like a queue

Linearizability [Herlihy & Wing ‘90] ConcurrentQueue behaves like a queue Concurrent behaviors of ConcurrentQueue are consistent with a sequential specification of a queue

Sequential Specification of a Queue A concise way to specify the behavior when operations are performed one at a time And so on … push(9) [10] [10, 9] push(10) pop() -> 10 [] pop() -> 9 push(10) [9] [9, 10] pop() -> <fail> push(9)

Linearizability [Herlihy & Wing ‘90] ConcurrentQueue behaves like a queue Concurrent behaviors of ConcurrentQueue are consistent with a sequential specification of a queue Every operation appears to occur atomically at some point between the call and return

Linearizability Thread 1 Thread 2 Thread 3 Component is linearizable if every operation appears to occur atomically at some point between the call and return Thread 1 push 10 return push 20 return Thread 2 pop return10 Thread 3 pop return “empty”

A Linearizability Violation History shown below is not linearizable Thread 1 push 20 return pop return 20 Thread 2 push 10 return pop return empty

Solution using Linearizability q = new ConcurrentQueue(); q.push(10); t1 = q.pop(); t2 = q.peek(); q.push(20); q.push(30); u1 = q.peek(); q.push(40); u2 = q.pop(); v1 = q.pop(); q.push(50); v2 = q.peek(); q.push(60); … Assert: Every concurrent behavior (for this test) is linearizable

Providing Sequential Specification is Hard Complexity: API for java.util.concurrent.BlockingQueue<T> contains 28 functions Logic: Even simple specifications require sophisticated logics Need to make the accessible to programmers Insight We have the code for the component Learn the specification automatically

Deterministic Linearizability ConcurrentQueue behaves like a queue Concurrent behaviors of ConcurrentQueue are consistent with Some determinstic sequential specification of a queue a sequential specification of a queue Learn this by observing the code under sequential runs

The LineUp Proposition Be complete all reported violations are conclusive refutations Be automatic User specifies component interface (gives list of operation calls), tool does the rest Be reasonably sound quantify unsoundness sources (missed bugs) Demonstrate empirically that we find enough real bugs

Implementation (part 1: generate Unit Tests) Random Testcase Generator Collection of Tests specify operations, # tests, size of tests Example: queue operations Example of a concurrent unit test: Thread 1 Thread 2 Thread 3 q.Add(10) q.Add(20) q.TryTake() q.Add(10), q.Add(20), q.TryTake(), q.Clear()

Implementation (part 2: check Each TEst) report violating execution Thread 1 Thread 2 Thread 3 q.Add(10) q.Remove() q.Clear() q.Add(20) unit test 2-Phase Check complete successfully What IS the sequential spec component under test (bytecode)

THE Two-Phase check (Phase 1) 200 400 400 200 Queue Implementation (Bytecode) PHASE 1 200 200 400 400 400 200 200 400 Enumerate Serial Histories record all observations (incl. return values) in file we are synthesizing a sequential specification! CHESS stateless model checker Unit Test What IS the sequential spec Thread 1 Thread 2 Add(200) TryTake() Add(400)

THE Two-Phase check (Phase 2) 200 400 400 200 Queue Implementation (Bytecode) 200 200 400 400 400 200 200 400 check PHASE 2 CHESS stateless model checker 200 Unit Test 200 What IS the sequential spec Thread 1 Thread 2 Add(200) TryTake() Add(400) Enumerate Concurrent Histories check against specification report violations # of fair executions is finite! [PLDI 07]

The Two-Phase Check: theorems Completeness A counterexample refutes deterministic linearizability (i.e. proves that no deterministic sequential specification exists). Restricted Soundness If the component is not deterministically linearizable, there exists a unit test that fails. How sound in practice? E.g. how good are 100 random 3x3 tests?

Results: Phase 1 / Phase 2

Results Each letter is a separate root cause

Example: incorrect CAS Results 7 Classical Concurrency Bugs Example: incorrect CAS volatile int state; ... int localstate = state; int newstate = f(state); // compute new value compare_and_swap(&state, localstate, newstate); } Should use localstate!

Results Cancel is not linearizable (may delay past return) Barrier is not linearizable (rendezvous not equivalent to any interleaved commit)

Results nondet.: bag may return any element nondet.: (weak spec) Count may return 0 and TryTake may return false even if not empty

(A) Incorrect use of CAS causes state corruption. (B) RemoveLast() uses an incorrect lock-free optimization. (C) Call to SemaphoreSlim includes a timeout parameter by mistake. (D) ToArray() can livelock when crossing segment boundaries. Note that the harness for this class performs a particular pre-test sequence (add 31 elements, remove 31 elements). (E) Insufficient locking: thread can get preempted while trying to set an exception. (F) Barrier is not a linearizable data type. Barriers block each thread until all threads have entered the barrier, a behavior that is not equivalent to any serial execution. (G) Cancel is not a linearizable method: The effect of the cancellation can be delayed past the operation return, and in fact even past subsequent operations on the same thread. (H) Count() may release a lock it does not own if interleaved with Add(). (I) Bag is nondeterministic by design to improve performance: the returned value can depend on the specific interleaving. (J) Count may return 0 even if the collection is not empty. The specification of the Count method was weakened after Line-Up detected this behavior. (K) TryTake may fail even if the collection is not empty. The specification of the TryTake method was weakened after Line-Up detected this behavior. (L) SetResult() throws the wrong exception if the task is already reserved for completion by somebody else, but not completed yet.

Related Work / Other Approaches Two-Phase Check CheckFence [PLDI 07], less automatic, only SC Race Detection The bugs we found do not manifest as data races Atomicity (Conflict-Serializability) Checking Many false alarms (programmers are creative) Traditional Linearizability Verification less automatic (proofs, specs, commit points) does not detect incorrect blocking

Extending Classic Linearizability We check deadlocks more tightly. [Herlihy, Moss]: Deadlocked histories are considered linearizable even if operations do not block in sequential specification. [Line-Up]: Deadlocked histories qualify as linearizable only if sequential specification allows all pending operations to block.

Comparison to Atomicity Checking Dynamic Atomicity Checking Reports not conflict-serializable executions stronger: implies deterministic linearizability Produced too many false warnings Implementations exhibited not conflict-serializable executions, for perfectly legitimate reasons

Comparison to Race Detection Popular: Data Race Detection Not effective in this case Code was data-race free due to disciplined use of ‘volatile’ qualifier Bugs did not manifest as data races

        Conclusion Find concurrency bugs Cover as many types as possible Simple mistakes (e.g. forgot to lock) Design mistakes (bugs in lock-free algorithm) Make good use of tester’s time Automation important (much to test, little time) Black-box: do not read code Give concise, precise “counterexamples”       

Conclusion Success: Found tricky concurrency bugs in public production code Bugs were fixed Deterministic linearizability is a useful thread-safety criterion Can be checked automatically In our case, better than race detection or atomicity checking High-value addition to CHESS Dramatically automates the process CHESS source release now available! advertise CHESS availability

Strawman Proposal Pro: can find bugs this way Write lots of testcases like this one: (where q is a thread-safe queue) Use Chess (stateless model checker) to cover all interleavings Thread 1 Thread 2 q.AddFirst(10) q.RemoveLast() q.AddLast(20) q.RemoveFirst(10) Assert(q.IsEmpty()) Pro: can find bugs this way (test above found bug!) Contra: not clear how to write these tests. Automation?

This test deadlocked incorrectly revealing this pernicious typo:

testing thread-safety of components Active Components Contain threads Initiate calls to other components Passive Components Do not contain threads Are called by other components We want to test these in isolation trend on multicore , programmers increasingly quote on thread-safety interpretation Java TBB lots of passive,... we focus on .NET 4.0 right now concurrency unit test

Passive components State is hidden Read/Change state via operations only Either not thread-safe... Caller must use synchronization to avoid concurrent operations ... or thread-safe Implementation contains sufficient synchronization to handle concurrent operations

Thread-Safe PASSIVE Components Various Implementations Possible Single global lock Fine-grained locking Lock-free algorithms (e.g. compare-and-swap) Replication Request queue & internal worker thread All of the above Implementations should share common ‘abstract’ behavior

q = new ConcurrentQueue(); Let’s Write a Test q = new ConcurrentQueue(); q.push(10); t = q.pop(); Assert( ? )

q = new ConcurrentQueue(); Let’s Write a Test q = new ConcurrentQueue(); q.push(10); t = q.pop(); Assert: q.size() is 0 or 1

q = new ConcurrentQueue(); and t is 10 or <fail> Let’s Write a Test q = new ConcurrentQueue(); q.push(10); t = q.pop(); Assert: q.size() is 0 or 1 and t is 10 or <fail>

Let’s Write a Test q = new ConcurrentQueue(); q.push(10); t = q.pop(); Assert: t = fail && q.size() = 1 && q.peek() == 10 || t = 10 && q.size() = 0

q = new ConcurrentQueue(); Let’s Write a Test q = new ConcurrentQueue(); q.push(10); t = q.pop(); q.push(20); u = q.pop(); Assert ( ? )

q = new ConcurrentQueue(); Let’s Write a Test q = new ConcurrentQueue(); q.push(10); t = q.pop(); q.push(20); u = q.pop(); Assert: q.size() == 0 && t = 10 || t = 20 && u = 10 || t = 20 && u != t

q = new ConcurrentQueue(); Let’s Write a Test q = new ConcurrentQueue(); q.push(10); t1 = q.pop(); t2 = q.peek(); q.push(20); q.push(30); u1 = q.peek(); q.push(40); u2 = q.pop(); v1 = q.pop(); q.push(50); v2 = q.peek(); q.push(60); … Assert ( ? )

Wouldn’t it be nice if we could just say… q = new ConcurrentQueue(); q.push(10); t1 = q.pop(); t2 = q.peek(); q.push(20); q.push(30); u1 = q.peek(); q.push(40); u2 = q.pop(); v1 = q.pop(); q.push(50); v2 = q.peek(); q.push(60); … Assert: ConcurrentQueue behaves like a queue

Informally, this is “thread safety” ConcurrentQueue behaves like a queue A piece of code is thread-safe if it functions correctly during simultaneous execution by multiple threads.

Formally, this is “Linearizability” [Herlihy & Wing ‘90] ConcurrentQueue behaves like a queue Concurrent behaviors of ConcurrentQueue are consistent with a sequential specification of a queue Every operation appears to occur atomically at some point between the call and return

So, simply check linearizability q = new ConcurrentQueue(); q.push(10); t1 = q.pop(); t2 = q.peek(); q.push(20); q.push(30); u1 = q.peek(); q.push(40); u2 = q.pop(); v1 = q.pop(); q.push(50); v2 = q.peek(); q.push(60); … Assert: Linearizability wrt a given sequential specification

Automatic Thread Safety Checking q = new ConcurrentQueue(); q.push(10); t1 = q.pop(); t2 = q.peek(); q.push(20); q.push(30); u1 = q.peek(); q.push(40); u2 = q.pop(); v1 = q.pop(); q.push(50); v2 = q.peek(); q.push(60); … Assert: ConcurrentQueue is Thread Safe Automatically learn how “a queue” behaves

Conclusions Beware of race conditions when you are designing your programs Think of all source of nondeterminism Reason about the space of program behaviors Use tools to explore the space Cuzz: Randomized algorithm for large programs CHESS: systematic algorithm for unit testing Thread-safety as a correctness criterion